SlideShare a Scribd company logo
1 of 93
Download to read offline
Mobility Technologies Co., Ltd.
3D Perception for Autonomous Driving
- Datasets and Algorithms -
Kazuyuki MIyazawa
AI R D Group 2, AI System Dept.
Mobility Technologies Co., Ltd.
Mobility Technologies Co., Ltd.
Who am I?
2
@kzykmyzw
Kazuyuki Miyazawa
Group Leader
AI R D Group 2
AI System Dept.
Mobility Technologies Co., Ltd.
Past Work Experience
April 2019 - March 2020
AI Research Engineer@DeNA Co., Ltd.
April 2010 - March 2019
Research Scientist@Mitsubishi Electric Corp.
Education
PhD in Information Science@Tohoku University
Mobility Technologies Co., Ltd.3
1 Autonomous Driving Datasets
Agenda
2 3D Object Detection Algorithms
Mobility Technologies Co., Ltd.
3D Object Detection: Motivation
■ 2D bounding boxes are not sufficient
■ Lack of 3D pose, Occlusion information, and 3D location
Preliminary (Today’s Main Topic)
4
2D Object Detection 3D Object Detection
http://www.cs.toronto.edu/~byang/
Mobility Technologies Co., Ltd.
Autonomous Driving
Datasets
5
01
Mobility Technologies Co., Ltd.
KITTI [2012]
6
Sensor Setup
● GPS/IMU x 1
● LiDAR (64ch) x 1
● Grayscale Camera (1.4M) x 2
● Color Camera (1.4M) x 2
http://www.cvlibs.net/datasets/kitti/
Mobility Technologies Co., Ltd.
KITTI [2012]
7
Mobility Technologies Co., Ltd.
3D Object Detection
8
● 7,481 training images / point clouds
● 7,518 test images / point clouds
● 80,256 labeled objects
type Car, Van, Truck, Pedestrian, Person_sitting, Cyclist,
Tram, Misc or DontCare
truncated 0 to 1, where truncated refers to the object leaving
image boundaries
occuluded 0 = fully visible, 1 = partly occluded, 2 = largely occluded,
3 = unknown
alpha Observation angle of object, ranging [-pi..pi]
bbox 2D bounding box of object in the image
dimensions 3D object dimensions: height, width, length
location 3D object location x,y,z in camera coordinate
rotation_y Rotation ry around Y-axis in camera coordinates [-pi..pi]
Annotations
Mobility Technologies Co., Ltd.
License
9
Mobility Technologies Co., Ltd.
Variants of KITTI
10
SemanticKITTI Dataset provides
annotations that associate each LiDAR
point with one of 28 semantic classes in all
22 sequences of the KITTI Dataset
http://semantic-kitti.org/
Virtual KITTI contains 50 high-resolution
monocular videos (21,260 frames)
generated from five different virtual worlds
in urban settings under different imaging
and weather conditions
https://europe.naverlabs.com/research/computer-vision/proxy-virtual-worlds/
Mobility Technologies Co., Ltd.
ApolloScape [2017]
11
Sensor Setup
● GPS/IMU x 1
● LiDAR x 2
● Color Camera (9.2M) x 2
http://apolloscape.auto/
Mobility Technologies Co., Ltd.
ApolloScape [2017]
12
Scene Parsing
3D Car Instance
Lane Segmentation
Mobility Technologies Co., Ltd.
ApolloScape [2017]
13
Self Localization Stereo
Mobility Technologies Co., Ltd.
3D Object Detection
14
● 53 min training sequences
● 50 min testing sequences
● 70K 3D fitted cars
type Small vehicle, Big vehicle, Pedestrian, Motorcyclist and
Bicyclist, Traffic cones, Others
dimensions 3D object dimensions: height, width, length
location 3D object location x,y,z in relative coordinate
heading Steering radian with respect to the direction of the object
Annotations
Mobility Technologies Co., Ltd.
■ To the extent that we authorize the Developer to use Datasets and subject to the terms of this
Agreement, the Developer is entitled to use the Datasets only (i) for Developer’s internal
purposes of non-commercial research or teaching and (ii) in accordance with the terms of this
Agreement.
License
15
http://apolloscape.auto/license.html
Mobility Technologies Co., Ltd.
nuScenes [2019]
16
Sensor Setup
● GPS/IMU x 1
● LiDAR (32ch) x 1
● RADAR x 5
● Color Camera (1.4M) x 3
https://www.nuscenes.org/
Mobility Technologies Co., Ltd.
Semantic Map
17
● Provide highly accurate human-annotated
semantic maps of the relevant areas
● 11 semantic classes
● Encourage the use of localization and
semantic maps as strong priors for all tasks
Mobility Technologies Co., Ltd.
3D Object Detection
18
● category
● attribute
● visibility
● instance
● sensor
● calibrated_sensor
● ego_pose
● log
● scene
● sample
● sample_data
● sample_annotation
● map
Number of annotations per category
Attributes distribution for selected categories
1.4M boxes in total
Mobility Technologies Co., Ltd.
License
19
Mobility Technologies Co., Ltd.
Argoverse [2019]
20
Sensor Setup
● GPS x 1
● LiDAR (32ch) x 2
● Color Camera (4.8M) x 2
● Color Camera (2M) x 7
https://www.argoverse.org/
Mobility Technologies Co., Ltd.
Argoverse Maps
21
Vector Map:
Lane-Level Geometry
Rasterized Map:
Ground Height
Rasterized Map:
Drivable Area
Mobility Technologies Co., Ltd.
3D Object Detection (3D Tracking)
22
● Collection of 113 log segments with
3D object tracking annotations
● These log segments vary in length
from 15 to 30 seconds and contain
a total of 11,052 tracks
● Each sequence includes
annotations for all objects within 5
meters of “drivable area” — the
area in which it is possible for a
vehicle to drive
Mobility Technologies Co., Ltd.
License
23
Mobility Technologies Co., Ltd.
Lyft Level 5 [2019]
24
Sensor Setup (BETA_V0)
● LiDAR (40ch) x 3
● WFOV Camera (1.2M) x 6
● Long-focal-length Camera (1.7M) x 1
Sensor Setup (BETA_++)
● LiDAR (64ch) x 1
● LiDAR (40ch) x 2
● WFOV Camera (2M) x 6
● Long-focal-length Camera (2M) x 1
https://level5.lyft.com/dataset/
Mobility Technologies Co., Ltd.
Semantic Map
25
Mobility Technologies Co., Ltd.
3D Object Detection (Same format as nuScenes)
26
● category
● attribute
● visibility
● instance
● sensor
● calibrated_sensor
● ego_pose
● log
● scene
● sample
● sample_data
● sample_annotation
● map
animal
bicycle
bus
car
emergency_vehicle
motorcycle
other_vehicle
pedestrian
truck
638K boxes in total
Mobility Technologies Co., Ltd.
License
27
Mobility Technologies Co., Ltd.
Audi Autonomous Driving Dataset (A2D2) [2020]
28
Sensor Setup
● GPS/IMU x 1
● LiDAR (16ch) x 5
● Color Camera (2.3M) x 6
https://www.a2d2.audi/a2d2/en.html
Mobility Technologies Co., Ltd.
Audi Autonomous Driving Dataset (A2D2) [2020]
29
Mobility Technologies Co., Ltd.
3D Object Detection
30
● All images have corresponding
LiDAR point clouds, of which
12,497 are annotated with 3D
bounding boxes within the field
of view of the front-center
camera
Mobility Technologies Co., Ltd.
License
31
Mobility Technologies Co., Ltd.
Comparison
32
? ? ?
? ? ?
These figures are based on Table 1 in https://arxiv.org/abs/1912.04838
Mobility Technologies Co., Ltd.
Comparison
33
Waymo Waymo Waymo
Waymo Waymo Waymo
These figures are based on Table 1 in https://arxiv.org/abs/1912.04838
Mobility Technologies Co., Ltd.
Waymo Open Dataset [2019]
34
Sensor Setup
● Mid-Range (~75m) LiDAR x 1
● Short-Range (~20m) LiDAR x 4
● Color Camera (2M) x 3
● Color Camera (1.6M) x 2
https://waymo.com/open/
Mobility Technologies Co., Ltd.
Data Volume
35
Train
798 segments
w/ labels
(757 GB)
Test
150 seg.
w/o labels
(192 GB)
Validation
202 seg.
w/ labels
(144 GB)
● Contain 1150 segments that each span 20 seconds
● Additionally, segments from a new location and only a subset have labels
are provided for domain adaptation
Mobility Technologies Co., Ltd.
Data Format
36
Segment Frame context Shared information among all frames in the scene (e.g., calibration parameters, stats)
timestamp_micros Frame timestamp
pose Vehicle pose
images Camera images and metadata (e.g., pose, velocity, timestamp)
lasers Range images
laser_labels 3D box annotations
projected_lidar_labels Lidar labels (laser_labels) projected to camera images
camera_labels 2D box annotations
no_label_zones Polygon that represents areas without labels (e.g., opposite side of a highway)
Frame ...
● Each segment (20 sec) consists of ~200 frames (10 Hz)
● All the data related to a segment is stored to a single tfrecord and represented
as protocol buffers
Mobility Technologies Co., Ltd.
Range Image
37
The point cloud of each LiDAR is encoded as a range image
1streturn2ndreturn
range
intensity
elongation
range
intensity
elongation
Mobility Technologies Co., Ltd.
API & Tutorial in colab
38
https://github.com/waymo-research/waymo-open-dataset
https://colab.research.google.com/github/waymo-research/waymo-open-dataset/blob/master/tutorial/tutorial.ipynb
Mobility Technologies Co., Ltd.
Data Visualization (LiDAR Point Cloud)
39
Mid-range LiDAR
Mobility Technologies Co., Ltd.
Data Visualization (LiDAR Point Cloud)
40
Mid-range LiDAR
Mobility Technologies Co., Ltd.
Data Visualization (LiDAR Point Cloud)
41
Mid-range LiDAR
Short-range LiDAR (front)
Mobility Technologies Co., Ltd.
Data Visualization (LiDAR Point Cloud)
42
Mid-range LiDAR
Short-range LiDAR (right)
Mobility Technologies Co., Ltd.
Data Visualization (LiDAR Point Cloud)
43
Mid-range LiDAR
Short-range LiDAR (rear)
Mobility Technologies Co., Ltd.
Data Visualization (LiDAR Point Cloud)
44
Mid-range LiDAR
Short-range LiDAR (left)
Mobility Technologies Co., Ltd.
Data Visualization (LiDAR Point Cloud)
45
Mid-range LiDAR
Short-range LiDARs (all)
Mobility Technologies Co., Ltd.
Data Visualization (Camera Images)
46
Front Left
1920x1080
Front
1920x1080
Front Right
1920x1080
Side Left
1920x886
Side Right
1920x886
Mobility Technologies Co., Ltd.
3D Object Detection
47
■ 3D LiDAR Lables
■ 3D 7-DOF bounding boxes in the
vehicle frame with globally unique
tracking IDs
■ vehicles, pedestrian, cyclists, signs
■ 2D Camera Lables
■ Not projections of 3D labels
■ vehicles, pedestrian, cyclists
■ Tight-fitting, axis-aligned 2D
bounding boxes with globally
unique tracking IDs
Vehicle Pedestrian Cyclists Signs
3D
Object 6.1M 2.8M 67K 3.2M
3D
TrackID 60K 23K 620 23K
2D
Object 7.7M 2.1M 63K -
2D
TrackID 164K 45K 1.3K -
Labeled object and tracking ID counts
Mobility Technologies Co., Ltd.
2D Label Samples
48
Mobility Technologies Co., Ltd.
3D Label Samples
49
Mobility Technologies Co., Ltd.
LiDAR to Camera Projection
50
■ Cameras and LiDARs data are well-synchronized
■ LiDAR points can be projected to camera image with rolling shutter effect compensation
Mobility Technologies Co., Ltd.
Challenges
51
Mobility Technologies Co., Ltd.
Evaluation Metrics for 3D Object Detection
52
https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-ranked-retrieval-results-1.html
P/R Curve Average Precision with Heading
Each true positive is weighted by heading
accuracy defined as
Ground truth
Prediction
Mobility Technologies Co., Ltd.
To ensure the Dataset is only used for Non-Commercial Purposes, You agree
■ Not to distribute or publish any models trained on or refined using the Dataset,
or the weights or biases from such trained models
■ Not to use or deploy the Dataset, any models trained on or refined using the
Dataset, or the weights or biases from such trained models (i) in operation of a
vehicle or to assist in the operation of a vehicle, (ii) in any Production Systems,
or (iii) for any other primarily commercial purposes
License
53
https://waymo.com/open/terms/
Mobility Technologies Co., Ltd.
3D Object Detection
Algorithms
54
02
Mobility Technologies Co., Ltd.
■ Design a novel type of neural network that directly consumes point clouds, which well respects
the permutation invariance of points in the input
■ Provide a unified architecture for applications ranging from object classification, part
segmentation, to scene semantic parsing
PointNet [C. Qi+, CVPR2017]
55
https://arxiv.org/abs/1612.00593
Mobility Technologies Co., Ltd.
PointNet Architecture
56
Mobility Technologies Co., Ltd.
PointNet Architecture
57
Predict an affine transformation
matrix by a mini-network and align
all input set to achieve invariance
against geometric transformations
Mobility Technologies Co., Ltd.
PointNet Architecture
58
The same alignment approach
is also applied in feature space
Mobility Technologies Co., Ltd.
PointNet Architecture
59
Using max pooling as
symmetric function, aggregate
unordered point features
Mobility Technologies Co., Ltd.
■ Divide a point cloud into 3D voxels and transform them into a unified feature representation
■ Descriptive volumetric representation is then connected to a RPN to generate detections
VoxelNet [Y, Zhou+, CVPR2018]
60
A voxel represents a value
on a regular grid in three-
dimensional space
https://en.wikipedia.org/wiki/Voxel
LiDAR ONLY
https://arxiv.org/abs/1711.06396
Mobility Technologies Co., Ltd.
Voxel Feature Encoding (VFE) Layer
61
● VFE enables inter-point interaction within
a voxel, by combining point-wise features
with a locally aggregated feature.
● Stacking multiple VFE layers allows
learning complex features for
characterizing local 3D shape information
Mobility Technologies Co., Ltd.
Convolutional Middle Layers
62
● Each convolutional middle layer applies 3D
convolution, BN layer, and ReLU layer
sequentially
● Convolutional middle layers aggregate
voxel-wise features within a progressively
expanding receptive field, adding more
context to the shape description
Mobility Technologies Co., Ltd.
Region Proposal Network
63
● The first layer of each block downsamples the input feature map
● Then the output of every block is upsampled to a fixed size and
concatenated to construct the high resolution feature map
● Finally, this feature map is mapped to the desired learning targets
Mobility Technologies Co., Ltd.
Evaluation on KITTI
64
Performance comparison on KITTI validation set
Performance comparison on KITTI test set
Mobility Technologies Co., Ltd.
■ Apply sparse convolution to greatly increase the speeds of training and inference
■ Introduce a novel angle loss regression approach to solve the problem of the large loss
generated when the angle prediction error is equal to π
SECOND (Sparsely Embedded CONvolutional Detection) [Y, Yan+, Sensors2018]
65
LiDAR ONLY
https://pdfs.semanticscholar.org/5125/a16039cabc6320c908a4764f32596e018ad3.pdf
Mobility Technologies Co., Ltd.
Sparse Convolution Algorithm
66
■ Gather the necessary input to construct the matrix, perform GEMM, then scatter the data back
■ GPU-based rule generation algorithm is proposed to construct input–output index rule matrix
Mobility Technologies Co., Ltd.
■ Directly predicting the radian offset suffers from an adversarial example problem between the
cases of 0 and π radians because they correspond to the same box but generate a large loss
when one is misidentified as the other
■ Solve this problem by introducing a new angle loss regression:
■ To address the issue that this loss treats boxes with opposite directions as being the same, a
simple direction classifier is added to the output of the RPN
Sine-Error Loss for Angle Regression
67
Mobility Technologies Co., Ltd.
Evaluation on KITTI
68
Performance comparison on KITTI validation set
Performance comparison on KITTI test set
Mobility Technologies Co., Ltd.
PointPillars [A. Lang+, CVPR2019]
69
■ Propose an encoder to learn a representation of point clouds organized in vertical columns
(pillars) and generate pseudo 2D image
■ Encoded features can be used with any standard 2D convolutional detection architecture
without computationally-expensive 3D ConvNets
LiDAR ONLY
https://arxiv.org/abs/1812.05784
Mobility Technologies Co., Ltd.
Pointcloud to Pseudo-Image
70
Point cloud is discretized into an evenly
spaced grid in the x-y plane,creating a
set of pillars
Mobility Technologies Co., Ltd.
Pointcloud to Pseudo-Image
71
Create a dense tensor of size (D, P, N)
D: Dimension of augmented lidar point (=9)
P: Number of non-empty pillars per sample
N: Number of points per pillar
Mobility Technologies Co., Ltd.
Pointcloud to Pseudo-Image
72
Apply PointNet to generate a (C, P,
N) sized feature tensor, followed by a
max operation over the channels to
create an output tensor of size (C, P)
Mobility Technologies Co., Ltd.
Pointcloud to Pseudo-Image
73
Features are scattered back to the original
pillar locations to create a pseudo-image of
size (C, H, W) where H and W indicate the
height and width of the canvas
Mobility Technologies Co., Ltd.
Backbone
74
Top-down network produces
features at increasingly
small spatial resolution
Second network performs
upsampling and concatenation
of the top-down features
Mobility Technologies Co., Ltd.
Detection Head
75
Single Shot Detector (SSD) is
used with additional regression
targets (height and elevation)
Mobility Technologies Co., Ltd.
Evaluation on KITTI
76
Performance comparison on KITTI test set
Mobility Technologies Co., Ltd.
■ Implementaion
■ Official PointPillar’s implementation is forked from SECOND’s
implementation and is no longer maintained
■ Instead, SECOND’s implementation now supports PointPillars
■ Format Conversion
■ SECOND’s implementation only supports KITTI and nuScenes, so format
conversion is the fastest way to use Waymo Open Dataset
■ Several converters can be found on GitHub
■ Waymo_Kitti_Adapter
■ waymo_kitti_converter
Let’s Try PointPillars on Waymo Open Dataset
77
Mobility Technologies Co., Ltd.
These results are just for reference, because only a part of training set is used and hyper parameters are not
tuned to Waymo Open Dataset at all
Vehicle Detection Results
78
Mobility Technologies Co., Ltd.
These results are just for reference, because only a part of training set is used and hyper parameters are not
tuned to Waymo Open Dataset at all
Vehicle Detection Results
79
Mobility Technologies Co., Ltd.
Results from Leaderboard on Waymo Open Dataset
80
https://waymo.com/open/challenges/3d-detection/#
Mobility Technologies Co., Ltd.
■ First generate 2D object region proposals in the RGB image using CNN, then each 2D region
is extruded to a 3D viewing frustum to get a point cloud
■ PointNet predicts a 3D bounding box for the object from the points in frustum
Frustum PointNets [C. Qi+, CVPR2018]
81
LiDAR + Camera
https://arxiv.org/abs/1812.05784
Mobility Technologies Co., Ltd.
Frustum Proposal
82
● Use object detector in RGB image to predict a 2D bounding box
and lift it to a frustum with a known camera matrix
● Collect all points within the frustum to form a frustum point cloud
Mobility Technologies Co., Ltd.
3D Instance Segmentation
83
Object instance is segmented by
binary classification of each point
using PointNet
Mobility Technologies Co., Ltd.
Amodal 3D Box Estimation
84
Estimate the object’s amodal
oriented 3D bounding box by
using a box regression PointNet
Estimate the true center of the
complete object and then
transform the coordinate such
that the predicted center
becomes the origin
Mobility Technologies Co., Ltd.
Evaluation on KITTI
85
Performance comparison on KITTI validation set
Performance comparison on KITTI test set
Mobility Technologies Co., Ltd.
PV-RCNN [S. Shi+, CVPR2020]
86
https://arxiv.org/abs/1912.13192
■ Voxel-based operation efficiently encodes multi-scale feature representations and can
generate high-quality 3D proposals, while the PointNet-based set abstraction operation
preserves accurate location information with flexible receptive fields
■ Integrate the two operations via the voxel-to-keypoint 3D scene encoding and the keypoint-to-
grid RoI feature abstraction
LiDAR ONLY
Mobility Technologies Co., Ltd.
3D Voxel CNN for Feature Encoding and Proposal Generation
87
Input points are first divided into
voxels and gradually converted into
feature volumes by 3D sparse CNN
By converting 3D feature volumes
into 2D bird-view feature maps,
high-quality 3D proposals are
generated following the anchor-
based approaches
Mobility Technologies Co., Ltd.
Voxel-to-keypoint Scene Encoding via Voxel Set Abstraction
88
Small number of
keypoints are sampled
from the point clouds
PointNet-based set abstraction module encodes
the multi-scale semantic features from the 3D
CNN feature volumes to the keypoints.
Check if each key point is inside or
outside of a ground-truth 3D box,
and re-weight the keypoint features
Mobility Technologies Co., Ltd.
Keypoint-to-grid RoI Feature Abstraction for Proposal Refinement
89
RoI-grid pooling module
aggregates the keypoint
features to the RoI-grid
points with multiple
receptive fields using
PointNet
Mobility Technologies Co., Ltd.
Evaluation on KITTI / Waymo Open Dataset
90
Performance comparison on KITTI test set
Performance comparison on Waymo OD validation set
Mobility Technologies Co., Ltd.
We Don’t Need Camera?
91
3D vehicle detection performance on KITTI test set (moderate)
LiDAR only
LiDAR + Camera
Mobility Technologies Co., Ltd.
■ Autonomous Driving Dataset
■ KITTI is most famous and frequently used dataset for vehicle related researches, however, it has
limited amount and the performance on the dataset is coming to a head (> 80% AP)
■ More recent datasets provide much larger multi-modal sensor data and annotations, and some of
them also provide semantic maps
■ Waymo Open Dataset is one of the largest and most diverse datasets ever released, and provides
high-quality (meata)data and annotations (but unfortunately, it’s NOT commercial-friendly at all)
■ 3D Object Detection Algorithms
■ Recent 3D object detection algorithms re-purpose camera-based detection architectures, which has
been greatly advanced by CNN and many mature techniques such as region proposal
■ Main two streams are the grid-based methods and the point-based methods, and a key component in
the former is 2D/3D CNN, and PointNet in the latter
■ Current SoTAs are dominated by LiDAR-only methods and LiDAR-camera fusion methods lag behind
Summary
92
·
Mobility Technologies Co., Ltd.

More Related Content

What's hot

Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)
nikhilus85
 
Comparative study on image segmentation techniques
Comparative study on image segmentation techniquesComparative study on image segmentation techniques
Comparative study on image segmentation techniques
gmidhubala
 

What's hot (20)

Yolov5
Yolov5 Yolov5
Yolov5
 
Vehicle detection
Vehicle detectionVehicle detection
Vehicle detection
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II
 
Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)
 
A Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth EstimationA Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth Estimation
 
Real Time Object Tracking
Real Time Object TrackingReal Time Object Tracking
Real Time Object Tracking
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
 
camera-based Lane detection by deep learning
camera-based Lane detection by deep learningcamera-based Lane detection by deep learning
camera-based Lane detection by deep learning
 
Moving object detection
Moving object detectionMoving object detection
Moving object detection
 
Depth estimation using deep learning
Depth estimation using deep learningDepth estimation using deep learning
Depth estimation using deep learning
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning II
 
Traffic sign recognition
Traffic sign recognitionTraffic sign recognition
Traffic sign recognition
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAM
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
Passive stereo vision with deep learning
Passive stereo vision with deep learningPassive stereo vision with deep learning
Passive stereo vision with deep learning
 
Moving Object Detection And Tracking Using CNN
Moving Object Detection And Tracking Using CNNMoving Object Detection And Tracking Using CNN
Moving Object Detection And Tracking Using CNN
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Comparative study on image segmentation techniques
Comparative study on image segmentation techniquesComparative study on image segmentation techniques
Comparative study on image segmentation techniques
 

Similar to 3D Perception for Autonomous Driving - Datasets and Algorithms -

Design of Image Segmentation Algorithm for Autonomous Vehicle Navigationusing...
Design of Image Segmentation Algorithm for Autonomous Vehicle Navigationusing...Design of Image Segmentation Algorithm for Autonomous Vehicle Navigationusing...
Design of Image Segmentation Algorithm for Autonomous Vehicle Navigationusing...
IJEEE
 
What we've done so far with mago3D, an open source based 'Digital Twin' platf...
What we've done so far with mago3D, an open source based 'Digital Twin' platf...What we've done so far with mago3D, an open source based 'Digital Twin' platf...
What we've done so far with mago3D, an open source based 'Digital Twin' platf...
SANGHEE SHIN
 
라이브드론맵 (Live Drone Map) - 실시간 드론 매핑 솔루션
라이브드론맵 (Live Drone Map) - 실시간 드론 매핑 솔루션라이브드론맵 (Live Drone Map) - 실시간 드론 매핑 솔루션
라이브드론맵 (Live Drone Map) - 실시간 드론 매핑 솔루션
Impyeong Lee
 
Presentation Object Recognition And Tracking Project
Presentation Object Recognition And Tracking ProjectPresentation Object Recognition And Tracking Project
Presentation Object Recognition And Tracking Project
Prathamesh Joshi
 

Similar to 3D Perception for Autonomous Driving - Datasets and Algorithms - (20)

fyp presentation of group 43011 final.pptx
fyp presentation of group 43011 final.pptxfyp presentation of group 43011 final.pptx
fyp presentation of group 43011 final.pptx
 
License Plate Recognition
License Plate RecognitionLicense Plate Recognition
License Plate Recognition
 
Design of Image Segmentation Algorithm for Autonomous Vehicle Navigationusing...
Design of Image Segmentation Algorithm for Autonomous Vehicle Navigationusing...Design of Image Segmentation Algorithm for Autonomous Vehicle Navigationusing...
Design of Image Segmentation Algorithm for Autonomous Vehicle Navigationusing...
 
IRJET - Floor Cleaning Robot with Vision
IRJET - Floor Cleaning Robot with VisionIRJET - Floor Cleaning Robot with Vision
IRJET - Floor Cleaning Robot with Vision
 
“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group
“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group
“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group
 
IRJET - Vehicle Signal Breaking Alert System
IRJET - Vehicle Signal Breaking Alert SystemIRJET - Vehicle Signal Breaking Alert System
IRJET - Vehicle Signal Breaking Alert System
 
IRJET- Proposed Design for 3D Map Generation using UAV
IRJET- Proposed Design for 3D Map Generation using UAVIRJET- Proposed Design for 3D Map Generation using UAV
IRJET- Proposed Design for 3D Map Generation using UAV
 
What we've done so far with mago3D, an open source based 'Digital Twin' platf...
What we've done so far with mago3D, an open source based 'Digital Twin' platf...What we've done so far with mago3D, an open source based 'Digital Twin' platf...
What we've done so far with mago3D, an open source based 'Digital Twin' platf...
 
DESIGN & DEVELOPMENT OF UNMANNED GROUND VEHICLE
DESIGN & DEVELOPMENT OF UNMANNED GROUND VEHICLEDESIGN & DEVELOPMENT OF UNMANNED GROUND VEHICLE
DESIGN & DEVELOPMENT OF UNMANNED GROUND VEHICLE
 
mago3D, A Brand-New Web Based Open Source GeoBIM Platform
mago3D, A Brand-New Web Based Open Source GeoBIM Platformmago3D, A Brand-New Web Based Open Source GeoBIM Platform
mago3D, A Brand-New Web Based Open Source GeoBIM Platform
 
라이브드론맵 (Live Drone Map) - 실시간 드론 매핑 솔루션
라이브드론맵 (Live Drone Map) - 실시간 드론 매핑 솔루션라이브드론맵 (Live Drone Map) - 실시간 드론 매핑 솔루션
라이브드론맵 (Live Drone Map) - 실시간 드론 매핑 솔루션
 
Worknet smart pole overview
Worknet smart pole overviewWorknet smart pole overview
Worknet smart pole overview
 
Presentation Object Recognition And Tracking Project
Presentation Object Recognition And Tracking ProjectPresentation Object Recognition And Tracking Project
Presentation Object Recognition And Tracking Project
 
COMPARATIVE STUDY ON AUTOMATED NUMBER PLATE EXTRACTION USING OPEN CV AND MATLAB
COMPARATIVE STUDY ON AUTOMATED NUMBER PLATE EXTRACTION USING OPEN CV AND MATLABCOMPARATIVE STUDY ON AUTOMATED NUMBER PLATE EXTRACTION USING OPEN CV AND MATLAB
COMPARATIVE STUDY ON AUTOMATED NUMBER PLATE EXTRACTION USING OPEN CV AND MATLAB
 
Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Kni...
Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Kni...Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Kni...
Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Kni...
 
IRJET - Detection of Landmine using Robotic Vehicle
IRJET -  	  Detection of Landmine using Robotic VehicleIRJET -  	  Detection of Landmine using Robotic Vehicle
IRJET - Detection of Landmine using Robotic Vehicle
 
Introduction to mago3D: A Web Based Open Source GeoBIM Platform
Introduction to mago3D: A Web Based Open Source GeoBIM PlatformIntroduction to mago3D: A Web Based Open Source GeoBIM Platform
Introduction to mago3D: A Web Based Open Source GeoBIM Platform
 
IRJET - Automated Gate for Vehicular Entry using Image Processing
IRJET - Automated Gate for Vehicular Entry using Image ProcessingIRJET - Automated Gate for Vehicular Entry using Image Processing
IRJET - Automated Gate for Vehicular Entry using Image Processing
 
Introduction to mago3D, an Open Source Based Digital Twin Platform
Introduction to mago3D, an Open Source Based Digital Twin PlatformIntroduction to mago3D, an Open Source Based Digital Twin Platform
Introduction to mago3D, an Open Source Based Digital Twin Platform
 
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
 

More from Kazuyuki Miyazawa

More from Kazuyuki Miyazawa (14)

VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...
VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...
VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...
 
Teslaにおけるコンピュータビジョン技術の調査 (2)
Teslaにおけるコンピュータビジョン技術の調査 (2)Teslaにおけるコンピュータビジョン技術の調査 (2)
Teslaにおけるコンピュータビジョン技術の調査 (2)
 
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...
 
Data-Centric AIの紹介
Data-Centric AIの紹介Data-Centric AIの紹介
Data-Centric AIの紹介
 
Teslaにおけるコンピュータビジョン技術の調査
Teslaにおけるコンピュータビジョン技術の調査Teslaにおけるコンピュータビジョン技術の調査
Teslaにおけるコンピュータビジョン技術の調査
 
ドラレコ + CV = 地図@Mobility Technologies
ドラレコ + CV = 地図@Mobility Technologiesドラレコ + CV = 地図@Mobility Technologies
ドラレコ + CV = 地図@Mobility Technologies
 
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for VisionMLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision
 
CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選
 
kaggle NFL 1st and Future - Impact Detection
kaggle NFL 1st and Future - Impact Detectionkaggle NFL 1st and Future - Impact Detection
kaggle NFL 1st and Future - Impact Detection
 
[CVPR2020読み会@CV勉強会] 3D Packing for Self-Supervised Monocular Depth Estimation
[CVPR2020読み会@CV勉強会] 3D Packing for Self-Supervised Monocular Depth Estimation[CVPR2020読み会@CV勉強会] 3D Packing for Self-Supervised Monocular Depth Estimation
[CVPR2020読み会@CV勉強会] 3D Packing for Self-Supervised Monocular Depth Estimation
 
How Much Position Information Do Convolutional Neural Networks Encode?
How Much Position Information Do Convolutional Neural Networks Encode?How Much Position Information Do Convolutional Neural Networks Encode?
How Much Position Information Do Convolutional Neural Networks Encode?
 
Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...
Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...
Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...
 
SIGGRAPH 2019 Report
SIGGRAPH 2019 ReportSIGGRAPH 2019 Report
SIGGRAPH 2019 Report
 
Devil is in the Edges: Learning Semantic Boundaries from Noisy Annotations
Devil is in the Edges: Learning Semantic Boundaries from Noisy AnnotationsDevil is in the Edges: Learning Semantic Boundaries from Noisy Annotations
Devil is in the Edges: Learning Semantic Boundaries from Noisy Annotations
 

Recently uploaded

Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Recently uploaded (20)

Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Bridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptxBridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptx
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 

3D Perception for Autonomous Driving - Datasets and Algorithms -

  • 1. Mobility Technologies Co., Ltd. 3D Perception for Autonomous Driving - Datasets and Algorithms - Kazuyuki MIyazawa AI R D Group 2, AI System Dept. Mobility Technologies Co., Ltd.
  • 2. Mobility Technologies Co., Ltd. Who am I? 2 @kzykmyzw Kazuyuki Miyazawa Group Leader AI R D Group 2 AI System Dept. Mobility Technologies Co., Ltd. Past Work Experience April 2019 - March 2020 AI Research Engineer@DeNA Co., Ltd. April 2010 - March 2019 Research Scientist@Mitsubishi Electric Corp. Education PhD in Information Science@Tohoku University
  • 3. Mobility Technologies Co., Ltd.3 1 Autonomous Driving Datasets Agenda 2 3D Object Detection Algorithms
  • 4. Mobility Technologies Co., Ltd. 3D Object Detection: Motivation ■ 2D bounding boxes are not sufficient ■ Lack of 3D pose, Occlusion information, and 3D location Preliminary (Today’s Main Topic) 4 2D Object Detection 3D Object Detection http://www.cs.toronto.edu/~byang/
  • 5. Mobility Technologies Co., Ltd. Autonomous Driving Datasets 5 01
  • 6. Mobility Technologies Co., Ltd. KITTI [2012] 6 Sensor Setup ● GPS/IMU x 1 ● LiDAR (64ch) x 1 ● Grayscale Camera (1.4M) x 2 ● Color Camera (1.4M) x 2 http://www.cvlibs.net/datasets/kitti/
  • 7. Mobility Technologies Co., Ltd. KITTI [2012] 7
  • 8. Mobility Technologies Co., Ltd. 3D Object Detection 8 ● 7,481 training images / point clouds ● 7,518 test images / point clouds ● 80,256 labeled objects type Car, Van, Truck, Pedestrian, Person_sitting, Cyclist, Tram, Misc or DontCare truncated 0 to 1, where truncated refers to the object leaving image boundaries occuluded 0 = fully visible, 1 = partly occluded, 2 = largely occluded, 3 = unknown alpha Observation angle of object, ranging [-pi..pi] bbox 2D bounding box of object in the image dimensions 3D object dimensions: height, width, length location 3D object location x,y,z in camera coordinate rotation_y Rotation ry around Y-axis in camera coordinates [-pi..pi] Annotations
  • 9. Mobility Technologies Co., Ltd. License 9
  • 10. Mobility Technologies Co., Ltd. Variants of KITTI 10 SemanticKITTI Dataset provides annotations that associate each LiDAR point with one of 28 semantic classes in all 22 sequences of the KITTI Dataset http://semantic-kitti.org/ Virtual KITTI contains 50 high-resolution monocular videos (21,260 frames) generated from five different virtual worlds in urban settings under different imaging and weather conditions https://europe.naverlabs.com/research/computer-vision/proxy-virtual-worlds/
  • 11. Mobility Technologies Co., Ltd. ApolloScape [2017] 11 Sensor Setup ● GPS/IMU x 1 ● LiDAR x 2 ● Color Camera (9.2M) x 2 http://apolloscape.auto/
  • 12. Mobility Technologies Co., Ltd. ApolloScape [2017] 12 Scene Parsing 3D Car Instance Lane Segmentation
  • 13. Mobility Technologies Co., Ltd. ApolloScape [2017] 13 Self Localization Stereo
  • 14. Mobility Technologies Co., Ltd. 3D Object Detection 14 ● 53 min training sequences ● 50 min testing sequences ● 70K 3D fitted cars type Small vehicle, Big vehicle, Pedestrian, Motorcyclist and Bicyclist, Traffic cones, Others dimensions 3D object dimensions: height, width, length location 3D object location x,y,z in relative coordinate heading Steering radian with respect to the direction of the object Annotations
  • 15. Mobility Technologies Co., Ltd. ■ To the extent that we authorize the Developer to use Datasets and subject to the terms of this Agreement, the Developer is entitled to use the Datasets only (i) for Developer’s internal purposes of non-commercial research or teaching and (ii) in accordance with the terms of this Agreement. License 15 http://apolloscape.auto/license.html
  • 16. Mobility Technologies Co., Ltd. nuScenes [2019] 16 Sensor Setup ● GPS/IMU x 1 ● LiDAR (32ch) x 1 ● RADAR x 5 ● Color Camera (1.4M) x 3 https://www.nuscenes.org/
  • 17. Mobility Technologies Co., Ltd. Semantic Map 17 ● Provide highly accurate human-annotated semantic maps of the relevant areas ● 11 semantic classes ● Encourage the use of localization and semantic maps as strong priors for all tasks
  • 18. Mobility Technologies Co., Ltd. 3D Object Detection 18 ● category ● attribute ● visibility ● instance ● sensor ● calibrated_sensor ● ego_pose ● log ● scene ● sample ● sample_data ● sample_annotation ● map Number of annotations per category Attributes distribution for selected categories 1.4M boxes in total
  • 19. Mobility Technologies Co., Ltd. License 19
  • 20. Mobility Technologies Co., Ltd. Argoverse [2019] 20 Sensor Setup ● GPS x 1 ● LiDAR (32ch) x 2 ● Color Camera (4.8M) x 2 ● Color Camera (2M) x 7 https://www.argoverse.org/
  • 21. Mobility Technologies Co., Ltd. Argoverse Maps 21 Vector Map: Lane-Level Geometry Rasterized Map: Ground Height Rasterized Map: Drivable Area
  • 22. Mobility Technologies Co., Ltd. 3D Object Detection (3D Tracking) 22 ● Collection of 113 log segments with 3D object tracking annotations ● These log segments vary in length from 15 to 30 seconds and contain a total of 11,052 tracks ● Each sequence includes annotations for all objects within 5 meters of “drivable area” — the area in which it is possible for a vehicle to drive
  • 23. Mobility Technologies Co., Ltd. License 23
  • 24. Mobility Technologies Co., Ltd. Lyft Level 5 [2019] 24 Sensor Setup (BETA_V0) ● LiDAR (40ch) x 3 ● WFOV Camera (1.2M) x 6 ● Long-focal-length Camera (1.7M) x 1 Sensor Setup (BETA_++) ● LiDAR (64ch) x 1 ● LiDAR (40ch) x 2 ● WFOV Camera (2M) x 6 ● Long-focal-length Camera (2M) x 1 https://level5.lyft.com/dataset/
  • 25. Mobility Technologies Co., Ltd. Semantic Map 25
  • 26. Mobility Technologies Co., Ltd. 3D Object Detection (Same format as nuScenes) 26 ● category ● attribute ● visibility ● instance ● sensor ● calibrated_sensor ● ego_pose ● log ● scene ● sample ● sample_data ● sample_annotation ● map animal bicycle bus car emergency_vehicle motorcycle other_vehicle pedestrian truck 638K boxes in total
  • 27. Mobility Technologies Co., Ltd. License 27
  • 28. Mobility Technologies Co., Ltd. Audi Autonomous Driving Dataset (A2D2) [2020] 28 Sensor Setup ● GPS/IMU x 1 ● LiDAR (16ch) x 5 ● Color Camera (2.3M) x 6 https://www.a2d2.audi/a2d2/en.html
  • 29. Mobility Technologies Co., Ltd. Audi Autonomous Driving Dataset (A2D2) [2020] 29
  • 30. Mobility Technologies Co., Ltd. 3D Object Detection 30 ● All images have corresponding LiDAR point clouds, of which 12,497 are annotated with 3D bounding boxes within the field of view of the front-center camera
  • 31. Mobility Technologies Co., Ltd. License 31
  • 32. Mobility Technologies Co., Ltd. Comparison 32 ? ? ? ? ? ? These figures are based on Table 1 in https://arxiv.org/abs/1912.04838
  • 33. Mobility Technologies Co., Ltd. Comparison 33 Waymo Waymo Waymo Waymo Waymo Waymo These figures are based on Table 1 in https://arxiv.org/abs/1912.04838
  • 34. Mobility Technologies Co., Ltd. Waymo Open Dataset [2019] 34 Sensor Setup ● Mid-Range (~75m) LiDAR x 1 ● Short-Range (~20m) LiDAR x 4 ● Color Camera (2M) x 3 ● Color Camera (1.6M) x 2 https://waymo.com/open/
  • 35. Mobility Technologies Co., Ltd. Data Volume 35 Train 798 segments w/ labels (757 GB) Test 150 seg. w/o labels (192 GB) Validation 202 seg. w/ labels (144 GB) ● Contain 1150 segments that each span 20 seconds ● Additionally, segments from a new location and only a subset have labels are provided for domain adaptation
  • 36. Mobility Technologies Co., Ltd. Data Format 36 Segment Frame context Shared information among all frames in the scene (e.g., calibration parameters, stats) timestamp_micros Frame timestamp pose Vehicle pose images Camera images and metadata (e.g., pose, velocity, timestamp) lasers Range images laser_labels 3D box annotations projected_lidar_labels Lidar labels (laser_labels) projected to camera images camera_labels 2D box annotations no_label_zones Polygon that represents areas without labels (e.g., opposite side of a highway) Frame ... ● Each segment (20 sec) consists of ~200 frames (10 Hz) ● All the data related to a segment is stored to a single tfrecord and represented as protocol buffers
  • 37. Mobility Technologies Co., Ltd. Range Image 37 The point cloud of each LiDAR is encoded as a range image 1streturn2ndreturn range intensity elongation range intensity elongation
  • 38. Mobility Technologies Co., Ltd. API & Tutorial in colab 38 https://github.com/waymo-research/waymo-open-dataset https://colab.research.google.com/github/waymo-research/waymo-open-dataset/blob/master/tutorial/tutorial.ipynb
  • 39. Mobility Technologies Co., Ltd. Data Visualization (LiDAR Point Cloud) 39 Mid-range LiDAR
  • 40. Mobility Technologies Co., Ltd. Data Visualization (LiDAR Point Cloud) 40 Mid-range LiDAR
  • 41. Mobility Technologies Co., Ltd. Data Visualization (LiDAR Point Cloud) 41 Mid-range LiDAR Short-range LiDAR (front)
  • 42. Mobility Technologies Co., Ltd. Data Visualization (LiDAR Point Cloud) 42 Mid-range LiDAR Short-range LiDAR (right)
  • 43. Mobility Technologies Co., Ltd. Data Visualization (LiDAR Point Cloud) 43 Mid-range LiDAR Short-range LiDAR (rear)
  • 44. Mobility Technologies Co., Ltd. Data Visualization (LiDAR Point Cloud) 44 Mid-range LiDAR Short-range LiDAR (left)
  • 45. Mobility Technologies Co., Ltd. Data Visualization (LiDAR Point Cloud) 45 Mid-range LiDAR Short-range LiDARs (all)
  • 46. Mobility Technologies Co., Ltd. Data Visualization (Camera Images) 46 Front Left 1920x1080 Front 1920x1080 Front Right 1920x1080 Side Left 1920x886 Side Right 1920x886
  • 47. Mobility Technologies Co., Ltd. 3D Object Detection 47 ■ 3D LiDAR Lables ■ 3D 7-DOF bounding boxes in the vehicle frame with globally unique tracking IDs ■ vehicles, pedestrian, cyclists, signs ■ 2D Camera Lables ■ Not projections of 3D labels ■ vehicles, pedestrian, cyclists ■ Tight-fitting, axis-aligned 2D bounding boxes with globally unique tracking IDs Vehicle Pedestrian Cyclists Signs 3D Object 6.1M 2.8M 67K 3.2M 3D TrackID 60K 23K 620 23K 2D Object 7.7M 2.1M 63K - 2D TrackID 164K 45K 1.3K - Labeled object and tracking ID counts
  • 48. Mobility Technologies Co., Ltd. 2D Label Samples 48
  • 49. Mobility Technologies Co., Ltd. 3D Label Samples 49
  • 50. Mobility Technologies Co., Ltd. LiDAR to Camera Projection 50 ■ Cameras and LiDARs data are well-synchronized ■ LiDAR points can be projected to camera image with rolling shutter effect compensation
  • 51. Mobility Technologies Co., Ltd. Challenges 51
  • 52. Mobility Technologies Co., Ltd. Evaluation Metrics for 3D Object Detection 52 https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-ranked-retrieval-results-1.html P/R Curve Average Precision with Heading Each true positive is weighted by heading accuracy defined as Ground truth Prediction
  • 53. Mobility Technologies Co., Ltd. To ensure the Dataset is only used for Non-Commercial Purposes, You agree ■ Not to distribute or publish any models trained on or refined using the Dataset, or the weights or biases from such trained models ■ Not to use or deploy the Dataset, any models trained on or refined using the Dataset, or the weights or biases from such trained models (i) in operation of a vehicle or to assist in the operation of a vehicle, (ii) in any Production Systems, or (iii) for any other primarily commercial purposes License 53 https://waymo.com/open/terms/
  • 54. Mobility Technologies Co., Ltd. 3D Object Detection Algorithms 54 02
  • 55. Mobility Technologies Co., Ltd. ■ Design a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input ■ Provide a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing PointNet [C. Qi+, CVPR2017] 55 https://arxiv.org/abs/1612.00593
  • 56. Mobility Technologies Co., Ltd. PointNet Architecture 56
  • 57. Mobility Technologies Co., Ltd. PointNet Architecture 57 Predict an affine transformation matrix by a mini-network and align all input set to achieve invariance against geometric transformations
  • 58. Mobility Technologies Co., Ltd. PointNet Architecture 58 The same alignment approach is also applied in feature space
  • 59. Mobility Technologies Co., Ltd. PointNet Architecture 59 Using max pooling as symmetric function, aggregate unordered point features
  • 60. Mobility Technologies Co., Ltd. ■ Divide a point cloud into 3D voxels and transform them into a unified feature representation ■ Descriptive volumetric representation is then connected to a RPN to generate detections VoxelNet [Y, Zhou+, CVPR2018] 60 A voxel represents a value on a regular grid in three- dimensional space https://en.wikipedia.org/wiki/Voxel LiDAR ONLY https://arxiv.org/abs/1711.06396
  • 61. Mobility Technologies Co., Ltd. Voxel Feature Encoding (VFE) Layer 61 ● VFE enables inter-point interaction within a voxel, by combining point-wise features with a locally aggregated feature. ● Stacking multiple VFE layers allows learning complex features for characterizing local 3D shape information
  • 62. Mobility Technologies Co., Ltd. Convolutional Middle Layers 62 ● Each convolutional middle layer applies 3D convolution, BN layer, and ReLU layer sequentially ● Convolutional middle layers aggregate voxel-wise features within a progressively expanding receptive field, adding more context to the shape description
  • 63. Mobility Technologies Co., Ltd. Region Proposal Network 63 ● The first layer of each block downsamples the input feature map ● Then the output of every block is upsampled to a fixed size and concatenated to construct the high resolution feature map ● Finally, this feature map is mapped to the desired learning targets
  • 64. Mobility Technologies Co., Ltd. Evaluation on KITTI 64 Performance comparison on KITTI validation set Performance comparison on KITTI test set
  • 65. Mobility Technologies Co., Ltd. ■ Apply sparse convolution to greatly increase the speeds of training and inference ■ Introduce a novel angle loss regression approach to solve the problem of the large loss generated when the angle prediction error is equal to π SECOND (Sparsely Embedded CONvolutional Detection) [Y, Yan+, Sensors2018] 65 LiDAR ONLY https://pdfs.semanticscholar.org/5125/a16039cabc6320c908a4764f32596e018ad3.pdf
  • 66. Mobility Technologies Co., Ltd. Sparse Convolution Algorithm 66 ■ Gather the necessary input to construct the matrix, perform GEMM, then scatter the data back ■ GPU-based rule generation algorithm is proposed to construct input–output index rule matrix
  • 67. Mobility Technologies Co., Ltd. ■ Directly predicting the radian offset suffers from an adversarial example problem between the cases of 0 and π radians because they correspond to the same box but generate a large loss when one is misidentified as the other ■ Solve this problem by introducing a new angle loss regression: ■ To address the issue that this loss treats boxes with opposite directions as being the same, a simple direction classifier is added to the output of the RPN Sine-Error Loss for Angle Regression 67
  • 68. Mobility Technologies Co., Ltd. Evaluation on KITTI 68 Performance comparison on KITTI validation set Performance comparison on KITTI test set
  • 69. Mobility Technologies Co., Ltd. PointPillars [A. Lang+, CVPR2019] 69 ■ Propose an encoder to learn a representation of point clouds organized in vertical columns (pillars) and generate pseudo 2D image ■ Encoded features can be used with any standard 2D convolutional detection architecture without computationally-expensive 3D ConvNets LiDAR ONLY https://arxiv.org/abs/1812.05784
  • 70. Mobility Technologies Co., Ltd. Pointcloud to Pseudo-Image 70 Point cloud is discretized into an evenly spaced grid in the x-y plane,creating a set of pillars
  • 71. Mobility Technologies Co., Ltd. Pointcloud to Pseudo-Image 71 Create a dense tensor of size (D, P, N) D: Dimension of augmented lidar point (=9) P: Number of non-empty pillars per sample N: Number of points per pillar
  • 72. Mobility Technologies Co., Ltd. Pointcloud to Pseudo-Image 72 Apply PointNet to generate a (C, P, N) sized feature tensor, followed by a max operation over the channels to create an output tensor of size (C, P)
  • 73. Mobility Technologies Co., Ltd. Pointcloud to Pseudo-Image 73 Features are scattered back to the original pillar locations to create a pseudo-image of size (C, H, W) where H and W indicate the height and width of the canvas
  • 74. Mobility Technologies Co., Ltd. Backbone 74 Top-down network produces features at increasingly small spatial resolution Second network performs upsampling and concatenation of the top-down features
  • 75. Mobility Technologies Co., Ltd. Detection Head 75 Single Shot Detector (SSD) is used with additional regression targets (height and elevation)
  • 76. Mobility Technologies Co., Ltd. Evaluation on KITTI 76 Performance comparison on KITTI test set
  • 77. Mobility Technologies Co., Ltd. ■ Implementaion ■ Official PointPillar’s implementation is forked from SECOND’s implementation and is no longer maintained ■ Instead, SECOND’s implementation now supports PointPillars ■ Format Conversion ■ SECOND’s implementation only supports KITTI and nuScenes, so format conversion is the fastest way to use Waymo Open Dataset ■ Several converters can be found on GitHub ■ Waymo_Kitti_Adapter ■ waymo_kitti_converter Let’s Try PointPillars on Waymo Open Dataset 77
  • 78. Mobility Technologies Co., Ltd. These results are just for reference, because only a part of training set is used and hyper parameters are not tuned to Waymo Open Dataset at all Vehicle Detection Results 78
  • 79. Mobility Technologies Co., Ltd. These results are just for reference, because only a part of training set is used and hyper parameters are not tuned to Waymo Open Dataset at all Vehicle Detection Results 79
  • 80. Mobility Technologies Co., Ltd. Results from Leaderboard on Waymo Open Dataset 80 https://waymo.com/open/challenges/3d-detection/#
  • 81. Mobility Technologies Co., Ltd. ■ First generate 2D object region proposals in the RGB image using CNN, then each 2D region is extruded to a 3D viewing frustum to get a point cloud ■ PointNet predicts a 3D bounding box for the object from the points in frustum Frustum PointNets [C. Qi+, CVPR2018] 81 LiDAR + Camera https://arxiv.org/abs/1812.05784
  • 82. Mobility Technologies Co., Ltd. Frustum Proposal 82 ● Use object detector in RGB image to predict a 2D bounding box and lift it to a frustum with a known camera matrix ● Collect all points within the frustum to form a frustum point cloud
  • 83. Mobility Technologies Co., Ltd. 3D Instance Segmentation 83 Object instance is segmented by binary classification of each point using PointNet
  • 84. Mobility Technologies Co., Ltd. Amodal 3D Box Estimation 84 Estimate the object’s amodal oriented 3D bounding box by using a box regression PointNet Estimate the true center of the complete object and then transform the coordinate such that the predicted center becomes the origin
  • 85. Mobility Technologies Co., Ltd. Evaluation on KITTI 85 Performance comparison on KITTI validation set Performance comparison on KITTI test set
  • 86. Mobility Technologies Co., Ltd. PV-RCNN [S. Shi+, CVPR2020] 86 https://arxiv.org/abs/1912.13192 ■ Voxel-based operation efficiently encodes multi-scale feature representations and can generate high-quality 3D proposals, while the PointNet-based set abstraction operation preserves accurate location information with flexible receptive fields ■ Integrate the two operations via the voxel-to-keypoint 3D scene encoding and the keypoint-to- grid RoI feature abstraction LiDAR ONLY
  • 87. Mobility Technologies Co., Ltd. 3D Voxel CNN for Feature Encoding and Proposal Generation 87 Input points are first divided into voxels and gradually converted into feature volumes by 3D sparse CNN By converting 3D feature volumes into 2D bird-view feature maps, high-quality 3D proposals are generated following the anchor- based approaches
  • 88. Mobility Technologies Co., Ltd. Voxel-to-keypoint Scene Encoding via Voxel Set Abstraction 88 Small number of keypoints are sampled from the point clouds PointNet-based set abstraction module encodes the multi-scale semantic features from the 3D CNN feature volumes to the keypoints. Check if each key point is inside or outside of a ground-truth 3D box, and re-weight the keypoint features
  • 89. Mobility Technologies Co., Ltd. Keypoint-to-grid RoI Feature Abstraction for Proposal Refinement 89 RoI-grid pooling module aggregates the keypoint features to the RoI-grid points with multiple receptive fields using PointNet
  • 90. Mobility Technologies Co., Ltd. Evaluation on KITTI / Waymo Open Dataset 90 Performance comparison on KITTI test set Performance comparison on Waymo OD validation set
  • 91. Mobility Technologies Co., Ltd. We Don’t Need Camera? 91 3D vehicle detection performance on KITTI test set (moderate) LiDAR only LiDAR + Camera
  • 92. Mobility Technologies Co., Ltd. ■ Autonomous Driving Dataset ■ KITTI is most famous and frequently used dataset for vehicle related researches, however, it has limited amount and the performance on the dataset is coming to a head (> 80% AP) ■ More recent datasets provide much larger multi-modal sensor data and annotations, and some of them also provide semantic maps ■ Waymo Open Dataset is one of the largest and most diverse datasets ever released, and provides high-quality (meata)data and annotations (but unfortunately, it’s NOT commercial-friendly at all) ■ 3D Object Detection Algorithms ■ Recent 3D object detection algorithms re-purpose camera-based detection architectures, which has been greatly advanced by CNN and many mature techniques such as region proposal ■ Main two streams are the grid-based methods and the point-based methods, and a key component in the former is 2D/3D CNN, and PointNet in the latter ■ Current SoTAs are dominated by LiDAR-only methods and LiDAR-camera fusion methods lag behind Summary 92