SlideShare une entreprise Scribd logo
1  sur  50
Télécharger pour lire hors ligne
Anchor Free Object Detection
by Deep Learning
Yu Huang
Sunnyvale, California
 UnitBox: An Advanced Object Detection Network
 Densebox
 Yolo1/2/3
 CornerNet
 ExtremeNet
 FSAF: Feature Selective Anchor-Free
 FCOS: Fully Convolutional One-Stage
 FoveaBox
 Center and Scale Prediction: A Box-free Approach for Object Detection
 Region Proposal by Guided Anchoring(GA-RPN)
 CenterNet: Objects as Points
 CenterNet: Keypoint Triplets for Object Detection
 CornerNet-Lite: Efficient Keypoint Based Object Detection
UnitBox: An Advanced Object Detection Network
 Deep CNN methods assume the object bounds to be four independent
variables, which could be regressed by the `2 loss separately.
 Such an oversimplified assumption is contrary to the well-received
observation, that those variables are correlated, resulting to less accurate
 To address the issue, use a Intersection over Union (IoU) loss function for
bounding box prediction, which regresses the four bounds of a predicted box
as a whole unit.
 By taking the advantages of IoU loss and deep FCN, the UnitBox is introduced,
which performs accurate and efficient localization, shows robust to objects of
varied shapes and scales, and converges fast.
UnitBox: An Advanced Object Detection Network
Illustration of IoU loss and l2 loss for
pixel-wise bounding box prediction.
The Architecture of UnitBox Network.
UnitBox: An Advanced Object Detection Network
Compared to l2 loss, the IoU loss is much more robust to scale variations for bounding box prediction.
DenseBox: Unifying Landmark Localization and Object Detection
 Fully convolutional neural network (FCN);
 Directly predicts bounding boxes and object class confidences through all
locations and scales of an image;
 Improve accuracy with landmark localization during multi-task learning.
Pipeline:1) Image pyramid is fed to the network. 2) After several layers of convolution and
pooling, upsampling feature map back and apply convolution layers to get final output. 3)
Convert output feature map to bounding boxes, and apply non-maximum suppression to all
bounding boxes over the threshold.
DenseBox: Landmark Localization and Object Detection
Densebox with landmark localization
You Only Look Once (YOLO) for Object Detection
The YOLO Detection System
The system models detection as a regression problem to
a 7724 tensor. This tensor encodes bounding boxes and
class probabilities for all objects in the image.
The network uses strided conv. layers to
downsample the feature space instead
of maxpooling layers. Pre-train the conv.
layers on the ImageNet classification
task and then double the resolution for
Note: More localization errors, relatively low recall.
YOLO9000: Better, Faster, Stronger
 Detect over 9000 object categories:;
 YOLOv2, 67 FPS, 76.8 mAP on VOC 2007; 40 FPS, 78.6 mAP;
 Jointly train on object detection COCO and classification ImageNet;
 Batch Normalization: 2% improvement in mAP;
 High Resolution Classifier: full 448 × 448 resolution, almost 4% up in mAP;
 Convolutional With Anchor Boxes: use anchor boxes to predict bound. boxes;
 Dimension Clusters: k-means on the training set bounding boxes to
automatically find good priors to adjust the boxes appropriately;
 Direct location prediction: predict location relative to location of the grid cell;
 Fine-Grained Features: 13 × 13 map, pass through layer from 26 × 26 res.
 Multi-Scale Training: Every 10 batches randomly a new image dimension size.
YOLO9000: Better, Faster, Stronger
Bounding boxes with dimension priors and
location prediction. Predict the width and
height of the box as offsets from cluster
centroids. Predict the center coordinates of the
box relative to the location of filter application
using a sigmoid function.
Clustering box dimensions on VOC and COCO. K-means
clustering on the dimensions of bounding boxes to get good
priors for model. Left: the average IOU we get with various
choices for k. Find that k = 5 gives a good tradeoff for recall vs.
complexity of the model. Right: relative centroids for VOC and
COCO. Both sets of priors favor thinner, taller boxes while COCO
has greater variation in size than VOC.
YOLO9000: Better, Faster, Stronger
 Based on Googlenet architecture, faster than VGG-16;
 Darknet-19: 19 convolutional layers and 5 maxpooling layers;
 Training for classification: Darknet, data augmentation;
 Training for detection: remove the last conv. layer, add on three 3 × 3
conv. layers with 1024 filters each followed by a final 1 × 1 conv. layer;
 Hierarchical classification: WordNet, -> WordTree, a model of visual
 Dataset combination with WordTree: combine labels from ImageNet &
 Joint classification and detection: use the COCO detection dataset
and the top 9000 classes from the full ImageNet release;
 YOLO9000: WordTree with 9418 classes.
YOLO9000: Better, Faster, Stronger
Combining datasets using WordTree hierarchy
 Predict bounding boxes using dimension clusters as anchor
boxes like yolo9000;
 Predict an objectness score for each bounding box using
logistic regression;
 Use binary cross-entropy loss for the class predictions;
 Predict boxes at 3 different scales:
 Extract features from those scales using a similar concept to
feature pyramid networks;
 Add several convolutional layers and the last of these predicts a
3-d tensor encoding bounding box, objectness, and class
 Take the feature map from 2 layers previous and upsample it by
2×and then merge with a feature map from earlier in the network;
 Use k-means clustering to determine bounding box priors (9
 A hybrid approach between the network used in YOLOv2,
DarkNet-19, and that newfangled residual network stuff.
 It uses successive 3x3, 1x1 convolutional layers with some shortcut
 It has 53 convolutional layers called DarkNet-53;
 At 320x320 it runs in 22ms at 28.2mAP, as good as SSD but 3
times faster.
CornerNet: Detecting Objects as Paired Keypoints
 CornerNet detects an object bounding box as a pair of keypoints, the top-
left corner and the bottom-right corner, using a single convolution neural
 By detecting objects as paired keypoints, eliminate the need for designing a
set of anchor boxes commonly used in prior single-stage detectors.
 In addition, corner pooling, a type of pooling layer that helps the network
better localize corners.
bounding box predictions overlaid on predicted heatmaps of corners
CornerNet: Detecting Objects as Paired Keypoints
Detect an object as a pair of bounding box corners grouped together. A convolutional network outputs a
heatmap for all top-left corners, a heatmap for all bottom-right corners, and an embedding vector for each
detected corner. The network is trained to predict similar embeddings for corners from the same object.
CornerNet: Detecting Objects as Paired Keypoints
Corner pooling: for each channel, we take the maximum values
(red dots) in two directions (red lines), each from a separate
feature map, and add the two maximums together (blue dot).
“Ground-truth” heatmaps for training.
CornerNet: Detecting Objects as Paired Keypoints
The backbone network is followed by two prediction modules, one for the top-left corners and the
other for the bottom-right corners. Using the predictions from both modules, we locate and group
the corners.
“pull” loss to train the network to group the
corners , “push” loss to separate the
ExtremeNet: Bottom-up Object Detection by
Grouping Extreme and Center Points
 Bottom-up approaches still perform competitively wrt top down approaches.
 To detect four extreme points and one center point of objects using a standard
keypoint estimation network.
 To group the five keypoints into a bounding box if they are geometrically aligned.
 Object detection is then a purely appearance-based keypoint estimation problem,
without region classification or implicit feature learning.
ExtremeNet: Bottom-up Object Detection by
Grouping Extreme and Center Points
The network predicts four
extreme point heatmaps (Top.
the heatmap overlaid on the
input image) and one center
heatmap (Bottom row left) for
each category. Combinations
of the peaks (Middle left) of four
extreme point heatmaps and
the geometric center of the
composed bounding box
(Middle right). A bounding box
is produced if and only if its
geometric center has a high
response in the center heatmap
(Bottom right).
ExtremeNet: Bottom-up Object Detection by
Grouping Extreme and Center Points
The network takes an image as input and produces four C-channel heatmaps, one C- channel
heatmap, and four 2-channel category-agnostic offset map. The heatmaps are trained by
weighted pixel-wise logistic regression, where the weight is used to reduce false-positive penalty
near the ground truth location. And the offset map is trained with Smooth L1 loss applied at
ground truth peak locations.
ExtremeNet: Bottom-up Object Detection by
Grouping Extreme and Center Points
In the case of multiple points being the extreme
point on one edge, our model predicts a segment
of low confident responses (a). Edge aggregation
enhances the confidence of the middle pixel (b).
FSAF: Feature Selective Anchor-Free Module
 Feature selective anchor-free (FSAF) module can be plugged into single shot
detectors with feature pyramid structure (FPN).
 The FSAF module avoids limitations by the anchor-based detection:
 1) heuristic-guided feature selection;
 2) overlap-based anchor sampling.
 The general concept of the FSAF module is online feature selection applied to
the training of multi-level anchor-free branches.
 Specifically, an anchor-free branch is attached to each level of the feature
pyramid, allowing box encoding and decoding in the anchor-free manner at
an arbitrary level.
 In training, dynamically assign each instance to the most suitable feature level.
 At the time of inference, the FSAF module can work jointly with anchor-based
branches by outputting predictions in parallel.
FSAF: Feature Selective Anchor-Free Module
Selected feature level in anchor-based branches may not be optimal.
FSAF module plugged into conventional anchor-based detection methods.
During training, each instance is assigned to a pyramid level via feature selection for setting up supervision.
FSAF: Feature Selective Anchor-Free Module
Supervision for an instance in one feature level of the
anchor-free branches. We use focal loss for
classification and IoU loss for box regression.
Online feature selection mechanism. Each instance is passing through all
levels of anchor-free branches to compute the averaged classification
(focal) loss and regression (IoU) loss over effective regions. Then the level
with minimal summation of two losses is selected to set up the supervision
signals for that instance.
FSAF: Feature Selective Anchor-Free Module
Network architecture of RetinaNet with FSAF module. The FSAF module only introduces two
additional conv layers (dashed feature maps) per pyramid level, keeping the architecture fully
FCOS: Fully Convolutional One-Stage Object Detection
 A fully convolutional one-stage object detector (FCOS) to solve object
detection in per-pixel prediction, analogue to semantic segmentation.
 This detector FCOS is anchorbox free, as well as proposal free.
 By eliminating the predefined set of anchor boxes, FCOS avoids the
complicated computation related to anchor boxes, as calculating
overlapping in training and significantly reduces the training memory footprint.
 Also it avoids all hyper-parameters related to anchor boxes, very sensitive to
the final detection performance.
 With the only post-processing NMS, FCOS outperforms previous anchor-based
one-stage detectors with the advantage of being much simpler.
FCOS: Fully Convolutional One-Stage Object Detection
The network architecture of FCOS, where C3, C4, and C5 denote the feature maps of the backbone
network and P3 to P7 are the feature levels used for the final prediction. H × W is the height and width
of feature maps. ‘/s’ (s = 8, 16, ..., 128) is the down-sampling ratio of the level of feature maps to the
input image. As an example, all the numbers are computed with an 800 × 1024 input.
FCOS: Fully Convolutional One-Stage Object Detection
ResNet-50 is used as the backbone. As shown in the figure, FCOS works well with a wide range of
objects including crowded, occluded, highly overlapped, extremely small and very large objects.
FoveaBox: Beyond Anchor-based Object Detector
 FoveaBox, an accurate and anchor-free framework for object detection.
 Object detectors with the anchors are limited to the design of anchors.
 FoveaBox directly learns the object existing possibility and the bounding box
coordinates without anchor reference.
 (a) predicting category-sensitive semantic maps for the object existing possibility,
 (b) producing category-agnostic Bbox for each position as object candidate.
 The scales of target boxes are naturally associated with feature pyramid
representations for each input image.
 For the objects with arbitrary aspect ratios, FoveaBox brings in significant
improvement compared to the anchor-based detectors.
 FoveaBox shows great robustness and generalization ability to the changed
distribution of bounding box shapes.
FoveaBox: Beyond Anchor-based Object Detector
FoveaBox object detector. For each output
spacial position that potentially presents an
object, FoveaBox directly predicts the
confidences for all target categories and the
bounding box.
FoveaBox network architecture.
FoveaBox uses a FPN backbone on
top of a feed-forward ResNet
architecture. To this backbone,
FoveaBox attaches two subnetworks,
one for classifying and one for
FoveaBox: Beyond Anchor-based Object Detector
These results are based on ResNet-101, achieving a single model box AP of 38.9.
Center and Scale Prediction: A Box-free Approach for
Object Detection
 It scans for feature points all over the image, for which convolution is suited.
 This detector goes for a higher-level abstraction, central points where there
are objects, and deep models capable of high level semantic abstraction.
 It predicts the scales of central points, also a straightforward convolution.
 Object detection is simplified as a straightforward center and scale
prediction task through convolutions.
 Though structurally simple, it presents competitive accuracy on several
challenging benchmarks, like pedestrian detection and face detection.
 A cross dataset evaluation is performed for the method’s generalization.
Center and Scale Prediction: A Box-free Approach for
Object Detection
The overall pipeline of the proposed CSP (Center and Scale Prediction) detector. The final
convolutions have two channels, one is a heatmap indicating the locations of the centers (red dots),
and the other serves to predict the scales (yellow dotted lines) for each detected center.
Center and Scale Prediction: A Box-free Approach for
Object Detection
Overall architecture of CSP, which mainly comprises two components, i.e. the feature extraction module and
the detection head. The feature extraction module concatenates feature maps of different resolutions into a
single one. The detection head merely contains a 3x3 convolutional layer, followed by two prediction layers,
one for the center location and the other for the corresponding scale.
Region Proposal by Guided Anchoring (GA-RPN)
 Guided Anchoring leverages semantic features to guide the anchoring.
 The method jointly predicts the locations where the center of objects of interest are
likely to exist as well as the scales and aspect ratios at different locations.
 On top of predicted anchor shapes, to mitigate the feature inconsistency with a
feature adaption module.
 Use of high-quality proposals to improve detection performance.
 The anchoring scheme can be seamlessly integrated into proposal methods and
 Code: //
Region Proposal by Guided Anchoring (GA-RPN)
GA-RPN framework. For each output feature map in the feature pyramid, use an anchor
generation module with two branches to predict the anchor location and shape, respectively. Then a
feature adaption module is applied to the original feature map to make the new feature map aware
of anchor shapes.
Region Proposal by Guided Anchoring (GA-RPN)
Anchor location target for multi-level features. Assign ground truth objects to
different feature levels according to their scales, and define CR,IR and OR
Region Proposal by Guided Anchoring (GA-RPN)
Examples of RPN proposals (top row) and GA-RPN proposals (bottom row).
CenterNet: Objects as Points
 Detection identifies objects as axis-aligned boxes in an image.
 To model an object as a single point — the center point of its bounding box.
 The detector uses keypoint estimation to find center points and regresses to all other
object properties, such as size, 3D location, orientation, and even pose.
 The center point based approach, CenterNet, is end-to-end differentiable, simpler,
faster, and more accurate than corresponding bounding box based detectors.
To model an object as the center point of its bounding box. The bounding box size and
other object properties are inferred from the keypoint feature at the center.
CenterNet: Objects as Points
Different between anchor-based detectors (a) and center point detector (b). Best viewed on screen.
(a) Standard anchor based detection. (b) Center point based detection.
Anchors count as positive with an
overlap IoU > 0.7 to any object,
negative with an over- lap IoU < 0.3,
or are ignored otherwise.
The center pixel is assigned to
the object. Nearby points have a
reduced negative loss. Object
size is regressed.
CenterNet: Objects as Points
Model diagrams. The numbers are the stride. (a): Hourglass Network as is in CornerNet. (b): ResNet with transpose
convolutions. Add one 3 × 3 deformable convolutional layer before each up-sampling layer. Specifically, use deformable
convolution to change the channels and then use transposed convolution to upsample the feature map (such two steps are
shown separately in 32 → 16. These two steps together as a dashed arrow for 16 → 8 and 8 → 4). (c): The original DLA-34
(Deep layer aggregation) for semantic segmentation. (d): Modified DLA-34. Add more skip connections upsampling stages
to deformable convolutional layer.
CenterNet: Object Detection with Keypoint Triplets
 An efficient solution which explores the visual patterns within each cropped
region with minimal costs.
 The framework upon a representative one-stage keypoint-based detector
named CornerNet.
 CenterNet, detects each object as a triplet, rather than a pair, of keypoints,
which improves both precision and recall.
 Two customized modules: cascade corner pooling and center pooling, play
the roles of enriching info. collected by both top-left and bottom-right
corners and provide more recognizable information at the central regions,
CenterNet: Object Detection with Keypoint Triplets
Architecture of CenterNet. A convolutional backbone network applies cascade corner pooling
and center pooling to output two corner heatmaps and a center keypoint heatmap, respectively.
Similar to CornerNet, a pair of detected corners and the similar embeddings are used to detect a
potential bounding box. Then the detected center keypoints are used to determine the final
bounding boxes.
CenterNet: Object Detection with Keypoint Triplets
(a) Center pooling takes the max values in both
horizontal and vertical directions. (b) Corner pooling
only takes the max values in boundary directions. (c)
Cascade corner pooling takes the max values in
both boundary directions and internal directions of
The structures of the center pooling module (a)
and the cascade top corner pooling module (b).
Center pooling and the cascade corner pooling
by combining the corner pooling at different
CornerNet-Lite: Efficient Keypoint Based Object Detection
 CornerNet-Lite is a combination of two efficient variants of CornerNet:
CornerNet-Saccade, which uses an attention mechanism to eliminate the
need for exhaustively processing all pixels of the image, and CornerNet-
Squeeze, which introduces a new compact backbone architecture.
 Together these two variants address the two critical use cases in efficient
object detection: improving efficiency without sacrificing accuracy, and
improving accuracy at real-time efficiency.
 CornerNet-Saccade is suitable for offline processing, improving the
efficiency of CornerNet by 6.0x and the AP by 1.0% on COCO.
 CornerNet-Squeeze is suitable for real-time detection, improving both the
efficiency and accuracy of the popular real-time detector YOLOv3 (34.4%
AP at 34ms for CornerNet-Squeeze compared to 33.0% AP at 39ms for
YOLOv3 on COCO).
CornerNet-Lite: Efficient Keypoint Based Object Detection
Overview of CornerNet-Saccade. Predict a set of possible object locations from the attention maps and
bounding boxes generated on a downsized full image. Zoom into each location and crop a small region
around that location. Then detect objects in each region. Control the efficiency by ranking the object
locations and choosing top k locations to process. Finally, merge the detections by NMS.
CornerNet-Lite: Efficient Keypoint Based Object Detection
 In contrast to CornerNet-Saccade, which focuses on a subset of the pixels to
reduce the amount of processing, CornerNet-Squeeze explores an alternative
approach of reducing the amount of processing per pixel.
 In CornerNet, most of the computational resources are spent on Hourglass-104.
 Hourglass-104 is built from residual blocks which consists of two 3 × 3 convolution
layers and a skip connection.
 Although Hourglass-104 achieves competitive performance, it is expensive in
terms of number of parameters and inference time.
 To reduce the complexity of Hourglass-104, incorporate ideas from SqueezeNet
and MobileNets to design a lightweight hourglass architecture.
 SqueezeNet’s 3 strategies to reduce network complexity: (1) replacing 3 × 3
kernels with 1 × 1 kernels; (2) decreasing input channels to 3 × 3 kernels; (3)
down- sampling late.
CornerNet-Lite: Efficient Keypoint Based Object Detection
Qualitative examples on COCO validation set.

Contenu connexe


Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Hwa Pyung Kim
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationVikas Jain
【DL輪読会】Segment Anything
【DL輪読会】Segment Anything【DL輪読会】Segment Anything
【DL輪読会】Segment AnythingDeep Learning JP
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learningSushant Shrivastava
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basicsBrodmann17
[DL輪読会]Object-Centric Learning with Slot Attention
[DL輪読会]Object-Centric Learning with Slot Attention[DL輪読会]Object-Centric Learning with Slot Attention
[DL輪読会]Object-Centric Learning with Slot AttentionDeep Learning JP
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionEntrepreneur / Startup
【DL輪読会】DiffRF: Rendering-guided 3D Radiance Field Diffusion [N. Muller+ CVPR2...
【DL輪読会】DiffRF: Rendering-guided 3D Radiance Field Diffusion [N. Muller+ CVPR2...【DL輪読会】DiffRF: Rendering-guided 3D Radiance Field Diffusion [N. Muller+ CVPR2...
【DL輪読会】DiffRF: Rendering-guided 3D Radiance Field Diffusion [N. Muller+ CVPR2...Deep Learning JP
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detectionWenjing Chen
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012Jinwon Lee
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
[DL輪読会]End-to-End Object Detection with Transformers
[DL輪読会]End-to-End Object Detection with Transformers[DL輪読会]End-to-End Object Detection with Transformers
[DL輪読会]End-to-End Object Detection with TransformersDeep Learning JP
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation岳華 杜
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것NAVER Engineering

Tendances (20)

You only look once
You only look onceYou only look once
You only look once
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and Classification
Introduction of Faster R-CNN
Introduction of Faster R-CNNIntroduction of Faster R-CNN
Introduction of Faster R-CNN
【DL輪読会】Segment Anything
【DL輪読会】Segment Anything【DL輪読会】Segment Anything
【DL輪読会】Segment Anything
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
[DL輪読会]Object-Centric Learning with Slot Attention
[DL輪読会]Object-Centric Learning with Slot Attention[DL輪読会]Object-Centric Learning with Slot Attention
[DL輪読会]Object-Centric Learning with Slot Attention
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detection
【DL輪読会】DiffRF: Rendering-guided 3D Radiance Field Diffusion [N. Muller+ CVPR2...
【DL輪読会】DiffRF: Rendering-guided 3D Radiance Field Diffusion [N. Muller+ CVPR2...【DL輪読会】DiffRF: Rendering-guided 3D Radiance Field Diffusion [N. Muller+ CVPR2...
【DL輪読会】DiffRF: Rendering-guided 3D Radiance Field Diffusion [N. Muller+ CVPR2...
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
[DL輪読会]End-to-End Object Detection with Transformers
[DL輪読会]End-to-End Object Detection with Transformers[DL輪読会]End-to-End Object Detection with Transformers
[DL輪読会]End-to-End Object Detection with Transformers
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것

Similaire à Anchor free object detection by deep learning

“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...Edge AI and Vision Alliance
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningCharles Deledalle
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level FeatureDongmin Choi
Chapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & KamberChapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & KamberHouw Liong The
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningHouw Liong The
YOLO9000 - PR023
YOLO9000 - PR023YOLO9000 - PR023
YOLO9000 - PR023Jinwon Lee
Review-image-segmentation-by-deep-learningTrong-An Bui
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.pptLPrashanthi
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNAutomatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNZihao(Gerald) Zhang
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptSubrata Kumer Paul
Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...Universitat de Barcelona
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingYu Huang
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IIYu Huang
Unsupervised Object Detection
Unsupervised Object DetectionUnsupervised Object Detection
Unsupervised Object DetectionMahan Fathi
IRJET - Real Time Object Detection using YOLOv3
IRJET - Real Time Object Detection using YOLOv3IRJET - Real Time Object Detection using YOLOv3
IRJET - Real Time Object Detection using YOLOv3IRJET Journal

Similaire à Anchor free object detection by deep learning (20)

“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, Captioning
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level Feature
Chapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & KamberChapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text mining
Yol ov2
Yol ov2Yol ov2
Yol ov2
YOLO9000 - PR023
YOLO9000 - PR023YOLO9000 - PR023
YOLO9000 - PR023
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.ppt
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNAutomatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.ppt
Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object tracking
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning II
Unsupervised Object Detection
Unsupervised Object DetectionUnsupervised Object Detection
Unsupervised Object Detection
IRJET - Real Time Object Detection using YOLOv3
IRJET - Real Time Object Detection using YOLOv3IRJET - Real Time Object Detection using YOLOv3
IRJET - Real Time Object Detection using YOLOv3

Plus de Yu Huang

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingYu Huang
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...Yu Huang
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingYu Huang
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingYu Huang
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationYu Huang
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and PredictionYu Huang
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIYu Huang
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduYu Huang
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the HoodYu Huang
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingYu Huang
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgYu Huang
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learningYu Huang
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymoYu Huang
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningYu Huang
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingYu Huang

Plus de Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving


UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637

Dernier (20)

(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project

Anchor free object detection by deep learning

  • 1. Anchor Free Object Detection by Deep Learning Yu Huang Sunnyvale, California
  • 2. Outline  UnitBox: An Advanced Object Detection Network  Densebox  Yolo1/2/3  CornerNet  ExtremeNet  FSAF: Feature Selective Anchor-Free  FCOS: Fully Convolutional One-Stage  FoveaBox  Center and Scale Prediction: A Box-free Approach for Object Detection  Region Proposal by Guided Anchoring(GA-RPN)  CenterNet: Objects as Points  CenterNet: Keypoint Triplets for Object Detection  CornerNet-Lite: Efficient Keypoint Based Object Detection
  • 3. UnitBox: An Advanced Object Detection Network  Deep CNN methods assume the object bounds to be four independent variables, which could be regressed by the `2 loss separately.  Such an oversimplified assumption is contrary to the well-received observation, that those variables are correlated, resulting to less accurate localization.  To address the issue, use a Intersection over Union (IoU) loss function for bounding box prediction, which regresses the four bounds of a predicted box as a whole unit.  By taking the advantages of IoU loss and deep FCN, the UnitBox is introduced, which performs accurate and efficient localization, shows robust to objects of varied shapes and scales, and converges fast.
  • 4. UnitBox: An Advanced Object Detection Network Illustration of IoU loss and l2 loss for pixel-wise bounding box prediction. The Architecture of UnitBox Network.
  • 5. UnitBox: An Advanced Object Detection Network Compared to l2 loss, the IoU loss is much more robust to scale variations for bounding box prediction.
  • 6. DenseBox: Unifying Landmark Localization and Object Detection  Fully convolutional neural network (FCN);  Directly predicts bounding boxes and object class confidences through all locations and scales of an image;  Improve accuracy with landmark localization during multi-task learning. Pipeline:1) Image pyramid is fed to the network. 2) After several layers of convolution and pooling, upsampling feature map back and apply convolution layers to get final output. 3) Convert output feature map to bounding boxes, and apply non-maximum suppression to all bounding boxes over the threshold.
  • 7. DenseBox: Landmark Localization and Object Detection DenseBox Densebox with landmark localization
  • 8. You Only Look Once (YOLO) for Object Detection The YOLO Detection System The system models detection as a regression problem to a 7724 tensor. This tensor encodes bounding boxes and class probabilities for all objects in the image. The network uses strided conv. layers to downsample the feature space instead of maxpooling layers. Pre-train the conv. layers on the ImageNet classification task and then double the resolution for detection. Note: More localization errors, relatively low recall.
  • 9. YOLO9000: Better, Faster, Stronger  Detect over 9000 object categories:;  YOLOv2, 67 FPS, 76.8 mAP on VOC 2007; 40 FPS, 78.6 mAP;  Jointly train on object detection COCO and classification ImageNet;  Batch Normalization: 2% improvement in mAP;  High Resolution Classifier: full 448 × 448 resolution, almost 4% up in mAP;  Convolutional With Anchor Boxes: use anchor boxes to predict bound. boxes;  Dimension Clusters: k-means on the training set bounding boxes to automatically find good priors to adjust the boxes appropriately;  Direct location prediction: predict location relative to location of the grid cell;  Fine-Grained Features: 13 × 13 map, pass through layer from 26 × 26 res.  Multi-Scale Training: Every 10 batches randomly a new image dimension size.
  • 10. YOLO9000: Better, Faster, Stronger Bounding boxes with dimension priors and location prediction. Predict the width and height of the box as offsets from cluster centroids. Predict the center coordinates of the box relative to the location of filter application using a sigmoid function. Clustering box dimensions on VOC and COCO. K-means clustering on the dimensions of bounding boxes to get good priors for model. Left: the average IOU we get with various choices for k. Find that k = 5 gives a good tradeoff for recall vs. complexity of the model. Right: relative centroids for VOC and COCO. Both sets of priors favor thinner, taller boxes while COCO has greater variation in size than VOC.
  • 11. YOLO9000: Better, Faster, Stronger  Based on Googlenet architecture, faster than VGG-16;  Darknet-19: 19 convolutional layers and 5 maxpooling layers;  Training for classification: Darknet, data augmentation;  Training for detection: remove the last conv. layer, add on three 3 × 3 conv. layers with 1024 filters each followed by a final 1 × 1 conv. layer;  Hierarchical classification: WordNet, -> WordTree, a model of visual concepts;  Dataset combination with WordTree: combine labels from ImageNet & COCO;  Joint classification and detection: use the COCO detection dataset and the top 9000 classes from the full ImageNet release;  YOLO9000: WordTree with 9418 classes.
  • 12. YOLO9000: Better, Faster, Stronger Combining datasets using WordTree hierarchy
  • 13. YOLO v3  Predict bounding boxes using dimension clusters as anchor boxes like yolo9000;  Predict an objectness score for each bounding box using logistic regression;  Use binary cross-entropy loss for the class predictions;  Predict boxes at 3 different scales:  Extract features from those scales using a similar concept to feature pyramid networks;  Add several convolutional layers and the last of these predicts a 3-d tensor encoding bounding box, objectness, and class predictions;  Take the feature map from 2 layers previous and upsample it by 2×and then merge with a feature map from earlier in the network;  Use k-means clustering to determine bounding box priors (9 clusters).
  • 14. YOLO v3  A hybrid approach between the network used in YOLOv2, DarkNet-19, and that newfangled residual network stuff.  It uses successive 3x3, 1x1 convolutional layers with some shortcut connections;  It has 53 convolutional layers called DarkNet-53;  At 320x320 it runs in 22ms at 28.2mAP, as good as SSD but 3 times faster.
  • 15. CornerNet: Detecting Objects as Paired Keypoints  CornerNet detects an object bounding box as a pair of keypoints, the top- left corner and the bottom-right corner, using a single convolution neural network.  By detecting objects as paired keypoints, eliminate the need for designing a set of anchor boxes commonly used in prior single-stage detectors.  In addition, corner pooling, a type of pooling layer that helps the network better localize corners. bounding box predictions overlaid on predicted heatmaps of corners
  • 16. CornerNet: Detecting Objects as Paired Keypoints Detect an object as a pair of bounding box corners grouped together. A convolutional network outputs a heatmap for all top-left corners, a heatmap for all bottom-right corners, and an embedding vector for each detected corner. The network is trained to predict similar embeddings for corners from the same object.
  • 17. CornerNet: Detecting Objects as Paired Keypoints Corner pooling: for each channel, we take the maximum values (red dots) in two directions (red lines), each from a separate feature map, and add the two maximums together (blue dot). “Ground-truth” heatmaps for training.
  • 18. CornerNet: Detecting Objects as Paired Keypoints The backbone network is followed by two prediction modules, one for the top-left corners and the other for the bottom-right corners. Using the predictions from both modules, we locate and group the corners. “pull” loss to train the network to group the corners , “push” loss to separate the corners:
  • 19. ExtremeNet: Bottom-up Object Detection by Grouping Extreme and Center Points  Bottom-up approaches still perform competitively wrt top down approaches.  To detect four extreme points and one center point of objects using a standard keypoint estimation network.  To group the five keypoints into a bounding box if they are geometrically aligned.  Object detection is then a purely appearance-based keypoint estimation problem, without region classification or implicit feature learning.
  • 20. ExtremeNet: Bottom-up Object Detection by Grouping Extreme and Center Points The network predicts four extreme point heatmaps (Top. the heatmap overlaid on the input image) and one center heatmap (Bottom row left) for each category. Combinations of the peaks (Middle left) of four extreme point heatmaps and the geometric center of the composed bounding box (Middle right). A bounding box is produced if and only if its geometric center has a high response in the center heatmap (Bottom right).
  • 21. ExtremeNet: Bottom-up Object Detection by Grouping Extreme and Center Points The network takes an image as input and produces four C-channel heatmaps, one C- channel heatmap, and four 2-channel category-agnostic offset map. The heatmaps are trained by weighted pixel-wise logistic regression, where the weight is used to reduce false-positive penalty near the ground truth location. And the offset map is trained with Smooth L1 loss applied at ground truth peak locations.
  • 22. ExtremeNet: Bottom-up Object Detection by Grouping Extreme and Center Points In the case of multiple points being the extreme point on one edge, our model predicts a segment of low confident responses (a). Edge aggregation enhances the confidence of the middle pixel (b).
  • 23. FSAF: Feature Selective Anchor-Free Module  Feature selective anchor-free (FSAF) module can be plugged into single shot detectors with feature pyramid structure (FPN).  The FSAF module avoids limitations by the anchor-based detection:  1) heuristic-guided feature selection;  2) overlap-based anchor sampling.  The general concept of the FSAF module is online feature selection applied to the training of multi-level anchor-free branches.  Specifically, an anchor-free branch is attached to each level of the feature pyramid, allowing box encoding and decoding in the anchor-free manner at an arbitrary level.  In training, dynamically assign each instance to the most suitable feature level.  At the time of inference, the FSAF module can work jointly with anchor-based branches by outputting predictions in parallel.
  • 24. FSAF: Feature Selective Anchor-Free Module Selected feature level in anchor-based branches may not be optimal. FSAF module plugged into conventional anchor-based detection methods. During training, each instance is assigned to a pyramid level via feature selection for setting up supervision.
  • 25. FSAF: Feature Selective Anchor-Free Module Supervision for an instance in one feature level of the anchor-free branches. We use focal loss for classification and IoU loss for box regression. Online feature selection mechanism. Each instance is passing through all levels of anchor-free branches to compute the averaged classification (focal) loss and regression (IoU) loss over effective regions. Then the level with minimal summation of two losses is selected to set up the supervision signals for that instance.
  • 26. FSAF: Feature Selective Anchor-Free Module Network architecture of RetinaNet with FSAF module. The FSAF module only introduces two additional conv layers (dashed feature maps) per pyramid level, keeping the architecture fully convolutional.
  • 27. FCOS: Fully Convolutional One-Stage Object Detection  A fully convolutional one-stage object detector (FCOS) to solve object detection in per-pixel prediction, analogue to semantic segmentation.  This detector FCOS is anchorbox free, as well as proposal free.  By eliminating the predefined set of anchor boxes, FCOS avoids the complicated computation related to anchor boxes, as calculating overlapping in training and significantly reduces the training memory footprint.  Also it avoids all hyper-parameters related to anchor boxes, very sensitive to the final detection performance.  With the only post-processing NMS, FCOS outperforms previous anchor-based one-stage detectors with the advantage of being much simpler.
  • 28. FCOS: Fully Convolutional One-Stage Object Detection The network architecture of FCOS, where C3, C4, and C5 denote the feature maps of the backbone network and P3 to P7 are the feature levels used for the final prediction. H × W is the height and width of feature maps. ‘/s’ (s = 8, 16, ..., 128) is the down-sampling ratio of the level of feature maps to the input image. As an example, all the numbers are computed with an 800 × 1024 input.
  • 29. FCOS: Fully Convolutional One-Stage Object Detection ResNet-50 is used as the backbone. As shown in the figure, FCOS works well with a wide range of objects including crowded, occluded, highly overlapped, extremely small and very large objects.
  • 30. FoveaBox: Beyond Anchor-based Object Detector  FoveaBox, an accurate and anchor-free framework for object detection.  Object detectors with the anchors are limited to the design of anchors.  FoveaBox directly learns the object existing possibility and the bounding box coordinates without anchor reference.  (a) predicting category-sensitive semantic maps for the object existing possibility,  (b) producing category-agnostic Bbox for each position as object candidate.  The scales of target boxes are naturally associated with feature pyramid representations for each input image.  For the objects with arbitrary aspect ratios, FoveaBox brings in significant improvement compared to the anchor-based detectors.  FoveaBox shows great robustness and generalization ability to the changed distribution of bounding box shapes.
  • 31. FoveaBox: Beyond Anchor-based Object Detector FoveaBox object detector. For each output spacial position that potentially presents an object, FoveaBox directly predicts the confidences for all target categories and the bounding box. FoveaBox network architecture. FoveaBox uses a FPN backbone on top of a feed-forward ResNet architecture. To this backbone, FoveaBox attaches two subnetworks, one for classifying and one for prediction.
  • 32. FoveaBox: Beyond Anchor-based Object Detector These results are based on ResNet-101, achieving a single model box AP of 38.9.
  • 33. Center and Scale Prediction: A Box-free Approach for Object Detection  It scans for feature points all over the image, for which convolution is suited.  This detector goes for a higher-level abstraction, central points where there are objects, and deep models capable of high level semantic abstraction.  It predicts the scales of central points, also a straightforward convolution.  Object detection is simplified as a straightforward center and scale prediction task through convolutions.  Though structurally simple, it presents competitive accuracy on several challenging benchmarks, like pedestrian detection and face detection.  A cross dataset evaluation is performed for the method’s generalization.
  • 34. Center and Scale Prediction: A Box-free Approach for Object Detection The overall pipeline of the proposed CSP (Center and Scale Prediction) detector. The final convolutions have two channels, one is a heatmap indicating the locations of the centers (red dots), and the other serves to predict the scales (yellow dotted lines) for each detected center.
  • 35. Center and Scale Prediction: A Box-free Approach for Object Detection Overall architecture of CSP, which mainly comprises two components, i.e. the feature extraction module and the detection head. The feature extraction module concatenates feature maps of different resolutions into a single one. The detection head merely contains a 3x3 convolutional layer, followed by two prediction layers, one for the center location and the other for the corresponding scale.
  • 36. Region Proposal by Guided Anchoring (GA-RPN)  Guided Anchoring leverages semantic features to guide the anchoring.  The method jointly predicts the locations where the center of objects of interest are likely to exist as well as the scales and aspect ratios at different locations.  On top of predicted anchor shapes, to mitigate the feature inconsistency with a feature adaption module.  Use of high-quality proposals to improve detection performance.  The anchoring scheme can be seamlessly integrated into proposal methods and detectors.  Code: //
  • 37. Region Proposal by Guided Anchoring (GA-RPN) GA-RPN framework. For each output feature map in the feature pyramid, use an anchor generation module with two branches to predict the anchor location and shape, respectively. Then a feature adaption module is applied to the original feature map to make the new feature map aware of anchor shapes.
  • 38. Region Proposal by Guided Anchoring (GA-RPN) Anchor location target for multi-level features. Assign ground truth objects to different feature levels according to their scales, and define CR,IR and OR respectively.
  • 39. Region Proposal by Guided Anchoring (GA-RPN) Examples of RPN proposals (top row) and GA-RPN proposals (bottom row).
  • 40. CenterNet: Objects as Points  Detection identifies objects as axis-aligned boxes in an image.  To model an object as a single point — the center point of its bounding box.  The detector uses keypoint estimation to find center points and regresses to all other object properties, such as size, 3D location, orientation, and even pose.  The center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. To model an object as the center point of its bounding box. The bounding box size and other object properties are inferred from the keypoint feature at the center.
  • 41. CenterNet: Objects as Points Different between anchor-based detectors (a) and center point detector (b). Best viewed on screen. (a) Standard anchor based detection. (b) Center point based detection. Anchors count as positive with an overlap IoU > 0.7 to any object, negative with an over- lap IoU < 0.3, or are ignored otherwise. The center pixel is assigned to the object. Nearby points have a reduced negative loss. Object size is regressed.
  • 42. CenterNet: Objects as Points Model diagrams. The numbers are the stride. (a): Hourglass Network as is in CornerNet. (b): ResNet with transpose convolutions. Add one 3 × 3 deformable convolutional layer before each up-sampling layer. Specifically, use deformable convolution to change the channels and then use transposed convolution to upsample the feature map (such two steps are shown separately in 32 → 16. These two steps together as a dashed arrow for 16 → 8 and 8 → 4). (c): The original DLA-34 (Deep layer aggregation) for semantic segmentation. (d): Modified DLA-34. Add more skip connections upsampling stages to deformable convolutional layer.
  • 43. CenterNet: Object Detection with Keypoint Triplets  An efficient solution which explores the visual patterns within each cropped region with minimal costs.  The framework upon a representative one-stage keypoint-based detector named CornerNet.  CenterNet, detects each object as a triplet, rather than a pair, of keypoints, which improves both precision and recall.  Two customized modules: cascade corner pooling and center pooling, play the roles of enriching info. collected by both top-left and bottom-right corners and provide more recognizable information at the central regions, respectively.
  • 44. CenterNet: Object Detection with Keypoint Triplets Architecture of CenterNet. A convolutional backbone network applies cascade corner pooling and center pooling to output two corner heatmaps and a center keypoint heatmap, respectively. Similar to CornerNet, a pair of detected corners and the similar embeddings are used to detect a potential bounding box. Then the detected center keypoints are used to determine the final bounding boxes.
  • 45. CenterNet: Object Detection with Keypoint Triplets (a) Center pooling takes the max values in both horizontal and vertical directions. (b) Corner pooling only takes the max values in boundary directions. (c) Cascade corner pooling takes the max values in both boundary directions and internal directions of objects. The structures of the center pooling module (a) and the cascade top corner pooling module (b). Center pooling and the cascade corner pooling by combining the corner pooling at different directions.
  • 46. CornerNet-Lite: Efficient Keypoint Based Object Detection  CornerNet-Lite is a combination of two efficient variants of CornerNet: CornerNet-Saccade, which uses an attention mechanism to eliminate the need for exhaustively processing all pixels of the image, and CornerNet- Squeeze, which introduces a new compact backbone architecture.  Together these two variants address the two critical use cases in efficient object detection: improving efficiency without sacrificing accuracy, and improving accuracy at real-time efficiency.  CornerNet-Saccade is suitable for offline processing, improving the efficiency of CornerNet by 6.0x and the AP by 1.0% on COCO.  CornerNet-Squeeze is suitable for real-time detection, improving both the efficiency and accuracy of the popular real-time detector YOLOv3 (34.4% AP at 34ms for CornerNet-Squeeze compared to 33.0% AP at 39ms for YOLOv3 on COCO).
  • 47. CornerNet-Lite: Efficient Keypoint Based Object Detection Overview of CornerNet-Saccade. Predict a set of possible object locations from the attention maps and bounding boxes generated on a downsized full image. Zoom into each location and crop a small region around that location. Then detect objects in each region. Control the efficiency by ranking the object locations and choosing top k locations to process. Finally, merge the detections by NMS.
  • 48. CornerNet-Lite: Efficient Keypoint Based Object Detection  In contrast to CornerNet-Saccade, which focuses on a subset of the pixels to reduce the amount of processing, CornerNet-Squeeze explores an alternative approach of reducing the amount of processing per pixel.  In CornerNet, most of the computational resources are spent on Hourglass-104.  Hourglass-104 is built from residual blocks which consists of two 3 × 3 convolution layers and a skip connection.  Although Hourglass-104 achieves competitive performance, it is expensive in terms of number of parameters and inference time.  To reduce the complexity of Hourglass-104, incorporate ideas from SqueezeNet and MobileNets to design a lightweight hourglass architecture.  SqueezeNet’s 3 strategies to reduce network complexity: (1) replacing 3 × 3 kernels with 1 × 1 kernels; (2) decreasing input channels to 3 × 3 kernels; (3) down- sampling late.
  • 49. CornerNet-Lite: Efficient Keypoint Based Object Detection Qualitative examples on COCO validation set.