SlideShare a Scribd company logo
1 of 36
Download to read offline
[course site]
Verónica Vilaplana
veronica.vilaplana@upc.edu
Associate Professor
Universitat Politecnica de Catalunya
Technical University of Catalonia
Segmentation
Day 3 Lecture 1
#DLUPC
Outline
● What is object segmentation?
○ Applications
● Semantic segmentation
○ From image classification to semantic segmentation
■ Fully convolutional networks
■ Learnable Upsampling
■ Skip connections
○ FCN8s, Dilated Convolutions, U-Net, ...
● Instance segmentation
○ Simultaneous Detection and Segmentation
○ Mask R-CNN
2
Image and object segmentation
● Image Segmentation
○ Group pixels into regions that share some similar properties
● Segmenting images into meaningful objects
○ Object-level segmentation: accurate localization and recognition
Superpixels
(Ren ICCV 2003
3
Object segmentation: applications
Image editing and composition (Xu, 2016) Robotics
Autonomous driving
(cordts, 2016)
Medical image analysis
(Casamitjana, 2017)
4
Semantic segmentation
● Label every pixel: recognize the class of every pixel
● Do not differentiate instances
5Mottaghi et al, “The role of context for object detection and semantic segmentation in the wild”, CVPR 2014
Instance segmentation
● Detect instances, categorize and label every pixel
● Labels are class-aware and instance-aware
6
Arnab,Torr “Pixelwise instance segmentation with a dynamically instantiated network”, CVPR 2017
Object detection Semantic Segm. Instance segm. Ground truth
Datasets for semantic/instance segmentation
7
● 20 categories
● +10,000 images
● Semantic segmentation GT
● Instance segmentation GT
● Real indoor & outdoor scenes
● 540 categories
● +10,000 images
● Dense annotations
● Semantic segmentation GT
● Objects + stuff
Pascal Visual Object Classes Pascal Context
Datasets for semantic/instance segmentation
8
● Real indoor scenes
● 10,000 images
● 58,658 3D bounding boxes
● Dense annotations
● Instances GT
● Semantic segmentation GT
● Objects + stuff
● Real indoor & outdoor scenes
● 80 categories
● +300,000 images
● 2M instances
● Partial annotations
● Semantic segmentation GT
● Instance segmentation GT
● Objects, but no stuff
SUN RGB-D
COCO Common Objects in Context
Datasets for semantic/instance segmentation
9
● Real driving scenes
● 30 categories
● +25,000 images
● 20,000 partial annotations
● 5,000 dense annotations
● Semantic segmentation GT
● Instance segmentation GT
● Depth, GPS and other metadata
● Objects and stuff
● Real general scenes
● +150 categories
● +22,000 images
● Semantic segmentation GT
● Instance + parts segmentation GT
● Objects and stuff
CityScapes ADE20K
From classification to semantic segmentation
10
CNN
DOG
CAT
extract a
patch
run through a CNN
trained for image
classification
classify the
center pixel
CAT
repeat for every pixel
From classification to semantic segmentation
● A classification network becoming fully convolutional
○ Fully connected layers can also be viewed as convolutions with kernels that cover the
entire input region
11
Shelhamer, Long, Darrell, Fully Convolutional Networks for Semantic Segmentation, 2014-2016
From classification to semantic segmentation
● Dense prediction: fully convolutional, end to end, pixel-to-pixel network
● Problems
○ Output is smaller than input → Add upsampling layers
○ Output is very coarse → Add fine details from previous layers
12
Final layer is 1X1 conv with #channels = #classes
Pixelwise loss function:
Credit: Shelhamer, Long
● Dense prediction: fully convolutional, end to end, pixel-to-pixel network
From classification to semantic segmentation
13
Final layer is 1X1 conv with #channels = #classes
Pixelwise loss function:
Conv, pool, non linearity
Learnable Upsampling Pixelwise Output + loss
Credit: Shelhamer, Long
Learnable upsampling: recovering spatial shape
Upsampling: Transposed convolution also called fractionally strided convolution or
‘deconvolution’
14
Convolution
More info:
Dumoulin et al, A guide to convolution arithmetic for deep learning, 2016
https://github.com/vdumoulin/conv_arithmetic
I input image 4x4 vectorized to 16x1
O output image 4x1 (later reshaped 2x2)
h 3x3 kernel;
C 16x4 (weights)
The backward pass is obtained by transposing C: CT
Transposed convolution (also called fractionally strided convolution or ‘deconvolution’): swaps
forward and backward passes of a convolution
Convolution as a matrix operation
C=
Learnable upsampling: recovering spatial shape
It is always possible to emulate a transposed convolution with a direct convolution (fractional
stride)
15
More info:
Dumoulin et al, A guide to convolution arithmetic for deep learning, 2016
https://github.com/vdumoulin/conv_arithmetic
Learnable upsampling: recovering spatial shape
16
1D Convolution with stride 2
1D Transposed
Convolution with stride 2
1D Subpixel convolution
with stride 1/2
The two operators can achieve the
same result if the filters are learned.
Shi, Is the deconvolution layer the same as a convolutional layer?, 2016
1D example:
DeconvNet:
VGG-16 (conv+Relu+MaxPool) + mirrored VGG (Unpooling+’deconv’+Relu)
More than one upsampling layer
17
Normal VGG “Upside down” VGG
Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015
Resolution: spectrum of deep features
Problem: coarse output
● Combine where (local, shallow) with what (global, deep)
18
fuse features into deep jet
(cf. Hariharan et al. CVPR15 “hypercolumn”)
Credit: Shelhamer, Long
Fine details: skip connections
19
Adding 1x1 conv classifying layer on top of pool4,
Then upsample x2 (init to bilinear and then learned)
conv7 prediction, sum both, and upsample x16 for output
end-to-end, joint learning
of semantics and location
skip tu fuse
layers
Credit: Shelhamer, Long
Skip connections
● A multi-stream network that fuses features/predictions across layers
20
Input image stride 32 stride 16 stride 8 ground truth
no skps 1 skip 2 skipsCredit: Shelhamer, Long
Transfer learning
● Cast ILSVRC (AlexNet, VGG, GoogLeNet) classifiers into FCNs and augment them for
dense prediction: discard classifier layer, transform FC to CONV, add 1X1 CONV with 21
filters for scoring at each output location, upsampling
● Add skip connections
● Train for segmentation by fine-tuning all layers with PASCAL VOC 2011 with a pixelwise
loss.
● Metrics: pixel accuracy, mean accuracy, mean pixel intersection over union
21
Mean IU: Per-class evaluation: an intersection
of the predicted and true sets of pixels for a
given class, divided by their union
Pascal Test Set
Pascal Validation Set
Based on VGG
FCN-32s = FCN VGG
FCN-8s results on Pascal
22
SDS: Simultaneous Detection and
Segmentation Hariharan et al. ECCV14
Semantic segmentation
Typical architecture
● Downsampling path: extracts coarse features
● Upsampling path: recovers input image resolution
● Skip connections: recovers detailed information
● Post-processing (optional): refines predictions (CRF)
Other architectures:
● DeepLab: ‘atrous’ convolutions + spatial pyramid + CRF (Chen, ICLR 2015)
● CRF-RNN: FCN + CRF as Recurrent NN (Zheng, ICCV 2015)
● U-Net (Ronnemberger, 2015)
● Fully Convolutional DenseNets (Jégou, 2016)
● Dilated convolutions (Yu, 2016) 23
U-Net
● A contracting path and an expansive path
● Adds convolutions in the upsampling path (“symmetric” net)
● Skip connections: concatenation of feature maps
24
Ronneberger et al, “U-Net: Convolutional Networks for Biomedical Image Segmentation”, arXiv 2015
Winner of
CAD Caries challenge ISBI 2015
Cell tracking challenge ISBI 2015
Fully convolutional DenseNets
● Adds feed-forward connections between layers
● Based on U-Nets:
○ connections between downsampling – upsampling paths
● Based on DenseNets* (for image classification):
○ each layer directly connected to every other layer
○ alleviate the vanishing-gradient problem
○ strengthen feature propagation
○ encourage feature reuse
○ substantially reduce the number of parameters.
25
Jégou et al, “The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation”, Dec. 2016
* Huang et al, “Densely connected convolutional networks”, arxiv Aug 2016
Dense block: Complete architecture:
Dilated convolutions
● Systematically aggregate multiscale contextual information without losing resolution
○ Usual convolution
○ Dilated convolution
26
Yu, Koltun, Multi-scale context aggregation by dilated convolutions, 2016
Instance segmentation
● Detect instances, categorize and label every pixel
● Labels are class-aware and instance-aware
27
Arnab,Torr “Pixelwise instance segmentation with a dynamically instantiated network”, CVPR 2017
Object detection Semantic Segm. Instance segm. Ground truth
Instance segmentation: Multi-task cascades
28
Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015
Won COCO 2015 challenge (with ResNet)
Learn entire model end to end
Instance segmentation: Multi-task cascades
Results on Pascal VOC 2012 and MS COCO:
29
Instance segmentation: Mask R-CNN
● Extension of Faster R-CNN to instance segmentation
● A Fully Convolutional Network (FCN) is added on top of the CNN features of Faster R-CNN to
generate a mask (segmentation output).
● This is in parallel to the classification and bounding box regression network of Faster R-CNN
● RoIAlign instead of RoIPool to properly aligning extracted features with input
30He et al, Mask R-CNN, 2017
Instance segmentation: Mask R-CNN
● Classification and bounding box detection losses like Faster R-CNN
● A new loss term for mask prediction
● Output: C x m x m volume for mask prediction (C classes, m size of square mask)
31
He et al, Mask R-CNN, 2017
Instance segmentation: Mask R-CNN
● Masks are combined with classifications and bounding boxes from Faster R-CNN
32
He et al, Mask R-CNN, 2017
Instance segmentation: Mask R-CNN
● Results on COCO dataset
● MNC and FCIS were winners of COCO 2015 and 2016
33
MNC: Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015
FCIS: Li et al., “Fully convolutional instance-aware semantic segmentation”, CVPR 2017
Summary
● Semantic segmentation
○ Fully convolutional networks
○ Learnable Upsampling
○ Skip connections
○ Models:
■ FCN8s, Dilated Convolutions, U-Net, FC Densenets
● Instance segmentation
○ Based on object/segments proposals
■ Simultaneous Detection and Segmentation (R-CNN)
■ Multii-task cascade (Faster R-CNN)
■ Mask R-CNN (Faster R-CNN)
○ Others
■ Recurrent instance segmentation
34
Questions?
35
FCN-8s vs DeepLab vs Dilated Convolutions
36
Input image FCN-8s DeepLab DilConv Ground truth
Pascal VOC 2012 test set
Mean IoU:
FCN-8s = 62.2
DeepLab = 62.1
DilConv = 67.6
FC8: Fully Convolutional Networks for Semantic Segmentation, Long, Darrell, Shelhamer, 2014-2016
DeepLab: Semantic Image Segm. with Deep Conv. Nets, Atrous Convolution, and Fully Connected CRFs, Chen, et al, 2015
Dilated convolutions: Multi-scale context aggregation by dilated convolutions, Yu, Koltun, 2016

More Related Content

What's hot

Deep learning for image super resolution
Deep learning for image super resolutionDeep learning for image super resolution
Deep learning for image super resolutionPrudhvi Raj
 
Image Smoothing using Frequency Domain Filters
Image Smoothing using Frequency Domain FiltersImage Smoothing using Frequency Domain Filters
Image Smoothing using Frequency Domain FiltersSuhaila Afzana
 
IMAGE SEGMENTATION TECHNIQUES
IMAGE SEGMENTATION TECHNIQUESIMAGE SEGMENTATION TECHNIQUES
IMAGE SEGMENTATION TECHNIQUESVicky Kumar
 
Introduction to Computer Vision.pdf
Introduction to Computer Vision.pdfIntroduction to Computer Vision.pdf
Introduction to Computer Vision.pdfKnoldus Inc.
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSungjoon Choi
 
U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptxNoorUlHaq47
 
Chapter 9 morphological image processing
Chapter 9 morphological image processingChapter 9 morphological image processing
Chapter 9 morphological image processingasodariyabhavesh
 

What's hot (20)

Edge detection
Edge detectionEdge detection
Edge detection
 
Deep learning for image super resolution
Deep learning for image super resolutionDeep learning for image super resolution
Deep learning for image super resolution
 
Computer Vision
Computer VisionComputer Vision
Computer Vision
 
Image Smoothing using Frequency Domain Filters
Image Smoothing using Frequency Domain FiltersImage Smoothing using Frequency Domain Filters
Image Smoothing using Frequency Domain Filters
 
U-Net (1).pptx
U-Net (1).pptxU-Net (1).pptx
U-Net (1).pptx
 
IMAGE SEGMENTATION.
IMAGE SEGMENTATION.IMAGE SEGMENTATION.
IMAGE SEGMENTATION.
 
Computer Vision
Computer VisionComputer Vision
Computer Vision
 
Lec15 sfm
Lec15 sfmLec15 sfm
Lec15 sfm
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
IMAGE SEGMENTATION TECHNIQUES
IMAGE SEGMENTATION TECHNIQUESIMAGE SEGMENTATION TECHNIQUES
IMAGE SEGMENTATION TECHNIQUES
 
Computer vision
Computer visionComputer vision
Computer vision
 
Introduction to Computer Vision.pdf
Introduction to Computer Vision.pdfIntroduction to Computer Vision.pdf
Introduction to Computer Vision.pdf
 
Edge detection
Edge detectionEdge detection
Edge detection
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep Learning
 
Deep Learning
Deep Learning Deep Learning
Deep Learning
 
Object Recognition
Object RecognitionObject Recognition
Object Recognition
 
U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptx
 
Segmentation
SegmentationSegmentation
Segmentation
 
Chapter 9 morphological image processing
Chapter 9 morphological image processingChapter 9 morphological image processing
Chapter 9 morphological image processing
 

Similar to Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)

物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術CHENHuiMei
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Panoptic Segmentation @CVPR2019
Panoptic Segmentation @CVPR2019Panoptic Segmentation @CVPR2019
Panoptic Segmentation @CVPR2019Kousuke Kuzuoka
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networksananth
 
Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++Dongheon Lee
 
GNR638_Course Project for spring semester
GNR638_Course Project for spring semesterGNR638_Course Project for spring semester
GNR638_Course Project for spring semesterBijayChandraDasTECH0
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningTrong-An Bui
 
Deep image retrieval learning global representations for image search
Deep image retrieval  learning global representations for image searchDeep image retrieval  learning global representations for image search
Deep image retrieval learning global representations for image searchUniversitat Politècnica de Catalunya
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesFellowship at Vodafone FutureLab
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...Edge AI and Vision Alliance
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesDmytro Mishkin
 
Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...Universitat de Barcelona
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in VisionSangmin Woo
 

Similar to Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision) (20)

物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
 
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
 
Scene understanding
Scene understandingScene understanding
Scene understanding
 
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
 
Panoptic Segmentation @CVPR2019
Panoptic Segmentation @CVPR2019Panoptic Segmentation @CVPR2019
Panoptic Segmentation @CVPR2019
 
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++
 
GNR638_Course Project for spring semester
GNR638_Course Project for spring semesterGNR638_Course Project for spring semester
GNR638_Course Project for spring semester
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learning
 
Deep image retrieval learning global representations for image search
Deep image retrieval  learning global representations for image searchDeep image retrieval  learning global representations for image search
Deep image retrieval learning global representations for image search
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network Approaches
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent Advances
 
Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...
 
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
 
lecture_16_jiajun.pdf
lecture_16_jiajun.pdflecture_16_jiajun.pdf
lecture_16_jiajun.pdf
 

More from Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 

Recently uploaded

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 

Recently uploaded (20)

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 

Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)

  • 1. [course site] Verónica Vilaplana veronica.vilaplana@upc.edu Associate Professor Universitat Politecnica de Catalunya Technical University of Catalonia Segmentation Day 3 Lecture 1 #DLUPC
  • 2. Outline ● What is object segmentation? ○ Applications ● Semantic segmentation ○ From image classification to semantic segmentation ■ Fully convolutional networks ■ Learnable Upsampling ■ Skip connections ○ FCN8s, Dilated Convolutions, U-Net, ... ● Instance segmentation ○ Simultaneous Detection and Segmentation ○ Mask R-CNN 2
  • 3. Image and object segmentation ● Image Segmentation ○ Group pixels into regions that share some similar properties ● Segmenting images into meaningful objects ○ Object-level segmentation: accurate localization and recognition Superpixels (Ren ICCV 2003 3
  • 4. Object segmentation: applications Image editing and composition (Xu, 2016) Robotics Autonomous driving (cordts, 2016) Medical image analysis (Casamitjana, 2017) 4
  • 5. Semantic segmentation ● Label every pixel: recognize the class of every pixel ● Do not differentiate instances 5Mottaghi et al, “The role of context for object detection and semantic segmentation in the wild”, CVPR 2014
  • 6. Instance segmentation ● Detect instances, categorize and label every pixel ● Labels are class-aware and instance-aware 6 Arnab,Torr “Pixelwise instance segmentation with a dynamically instantiated network”, CVPR 2017 Object detection Semantic Segm. Instance segm. Ground truth
  • 7. Datasets for semantic/instance segmentation 7 ● 20 categories ● +10,000 images ● Semantic segmentation GT ● Instance segmentation GT ● Real indoor & outdoor scenes ● 540 categories ● +10,000 images ● Dense annotations ● Semantic segmentation GT ● Objects + stuff Pascal Visual Object Classes Pascal Context
  • 8. Datasets for semantic/instance segmentation 8 ● Real indoor scenes ● 10,000 images ● 58,658 3D bounding boxes ● Dense annotations ● Instances GT ● Semantic segmentation GT ● Objects + stuff ● Real indoor & outdoor scenes ● 80 categories ● +300,000 images ● 2M instances ● Partial annotations ● Semantic segmentation GT ● Instance segmentation GT ● Objects, but no stuff SUN RGB-D COCO Common Objects in Context
  • 9. Datasets for semantic/instance segmentation 9 ● Real driving scenes ● 30 categories ● +25,000 images ● 20,000 partial annotations ● 5,000 dense annotations ● Semantic segmentation GT ● Instance segmentation GT ● Depth, GPS and other metadata ● Objects and stuff ● Real general scenes ● +150 categories ● +22,000 images ● Semantic segmentation GT ● Instance + parts segmentation GT ● Objects and stuff CityScapes ADE20K
  • 10. From classification to semantic segmentation 10 CNN DOG CAT extract a patch run through a CNN trained for image classification classify the center pixel CAT repeat for every pixel
  • 11. From classification to semantic segmentation ● A classification network becoming fully convolutional ○ Fully connected layers can also be viewed as convolutions with kernels that cover the entire input region 11 Shelhamer, Long, Darrell, Fully Convolutional Networks for Semantic Segmentation, 2014-2016
  • 12. From classification to semantic segmentation ● Dense prediction: fully convolutional, end to end, pixel-to-pixel network ● Problems ○ Output is smaller than input → Add upsampling layers ○ Output is very coarse → Add fine details from previous layers 12 Final layer is 1X1 conv with #channels = #classes Pixelwise loss function: Credit: Shelhamer, Long
  • 13. ● Dense prediction: fully convolutional, end to end, pixel-to-pixel network From classification to semantic segmentation 13 Final layer is 1X1 conv with #channels = #classes Pixelwise loss function: Conv, pool, non linearity Learnable Upsampling Pixelwise Output + loss Credit: Shelhamer, Long
  • 14. Learnable upsampling: recovering spatial shape Upsampling: Transposed convolution also called fractionally strided convolution or ‘deconvolution’ 14 Convolution More info: Dumoulin et al, A guide to convolution arithmetic for deep learning, 2016 https://github.com/vdumoulin/conv_arithmetic I input image 4x4 vectorized to 16x1 O output image 4x1 (later reshaped 2x2) h 3x3 kernel; C 16x4 (weights) The backward pass is obtained by transposing C: CT Transposed convolution (also called fractionally strided convolution or ‘deconvolution’): swaps forward and backward passes of a convolution Convolution as a matrix operation C=
  • 15. Learnable upsampling: recovering spatial shape It is always possible to emulate a transposed convolution with a direct convolution (fractional stride) 15 More info: Dumoulin et al, A guide to convolution arithmetic for deep learning, 2016 https://github.com/vdumoulin/conv_arithmetic
  • 16. Learnable upsampling: recovering spatial shape 16 1D Convolution with stride 2 1D Transposed Convolution with stride 2 1D Subpixel convolution with stride 1/2 The two operators can achieve the same result if the filters are learned. Shi, Is the deconvolution layer the same as a convolutional layer?, 2016 1D example:
  • 17. DeconvNet: VGG-16 (conv+Relu+MaxPool) + mirrored VGG (Unpooling+’deconv’+Relu) More than one upsampling layer 17 Normal VGG “Upside down” VGG Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015
  • 18. Resolution: spectrum of deep features Problem: coarse output ● Combine where (local, shallow) with what (global, deep) 18 fuse features into deep jet (cf. Hariharan et al. CVPR15 “hypercolumn”) Credit: Shelhamer, Long
  • 19. Fine details: skip connections 19 Adding 1x1 conv classifying layer on top of pool4, Then upsample x2 (init to bilinear and then learned) conv7 prediction, sum both, and upsample x16 for output end-to-end, joint learning of semantics and location skip tu fuse layers Credit: Shelhamer, Long
  • 20. Skip connections ● A multi-stream network that fuses features/predictions across layers 20 Input image stride 32 stride 16 stride 8 ground truth no skps 1 skip 2 skipsCredit: Shelhamer, Long
  • 21. Transfer learning ● Cast ILSVRC (AlexNet, VGG, GoogLeNet) classifiers into FCNs and augment them for dense prediction: discard classifier layer, transform FC to CONV, add 1X1 CONV with 21 filters for scoring at each output location, upsampling ● Add skip connections ● Train for segmentation by fine-tuning all layers with PASCAL VOC 2011 with a pixelwise loss. ● Metrics: pixel accuracy, mean accuracy, mean pixel intersection over union 21 Mean IU: Per-class evaluation: an intersection of the predicted and true sets of pixels for a given class, divided by their union Pascal Test Set Pascal Validation Set Based on VGG FCN-32s = FCN VGG
  • 22. FCN-8s results on Pascal 22 SDS: Simultaneous Detection and Segmentation Hariharan et al. ECCV14
  • 23. Semantic segmentation Typical architecture ● Downsampling path: extracts coarse features ● Upsampling path: recovers input image resolution ● Skip connections: recovers detailed information ● Post-processing (optional): refines predictions (CRF) Other architectures: ● DeepLab: ‘atrous’ convolutions + spatial pyramid + CRF (Chen, ICLR 2015) ● CRF-RNN: FCN + CRF as Recurrent NN (Zheng, ICCV 2015) ● U-Net (Ronnemberger, 2015) ● Fully Convolutional DenseNets (Jégou, 2016) ● Dilated convolutions (Yu, 2016) 23
  • 24. U-Net ● A contracting path and an expansive path ● Adds convolutions in the upsampling path (“symmetric” net) ● Skip connections: concatenation of feature maps 24 Ronneberger et al, “U-Net: Convolutional Networks for Biomedical Image Segmentation”, arXiv 2015 Winner of CAD Caries challenge ISBI 2015 Cell tracking challenge ISBI 2015
  • 25. Fully convolutional DenseNets ● Adds feed-forward connections between layers ● Based on U-Nets: ○ connections between downsampling – upsampling paths ● Based on DenseNets* (for image classification): ○ each layer directly connected to every other layer ○ alleviate the vanishing-gradient problem ○ strengthen feature propagation ○ encourage feature reuse ○ substantially reduce the number of parameters. 25 Jégou et al, “The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation”, Dec. 2016 * Huang et al, “Densely connected convolutional networks”, arxiv Aug 2016 Dense block: Complete architecture:
  • 26. Dilated convolutions ● Systematically aggregate multiscale contextual information without losing resolution ○ Usual convolution ○ Dilated convolution 26 Yu, Koltun, Multi-scale context aggregation by dilated convolutions, 2016
  • 27. Instance segmentation ● Detect instances, categorize and label every pixel ● Labels are class-aware and instance-aware 27 Arnab,Torr “Pixelwise instance segmentation with a dynamically instantiated network”, CVPR 2017 Object detection Semantic Segm. Instance segm. Ground truth
  • 28. Instance segmentation: Multi-task cascades 28 Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015 Won COCO 2015 challenge (with ResNet) Learn entire model end to end
  • 29. Instance segmentation: Multi-task cascades Results on Pascal VOC 2012 and MS COCO: 29
  • 30. Instance segmentation: Mask R-CNN ● Extension of Faster R-CNN to instance segmentation ● A Fully Convolutional Network (FCN) is added on top of the CNN features of Faster R-CNN to generate a mask (segmentation output). ● This is in parallel to the classification and bounding box regression network of Faster R-CNN ● RoIAlign instead of RoIPool to properly aligning extracted features with input 30He et al, Mask R-CNN, 2017
  • 31. Instance segmentation: Mask R-CNN ● Classification and bounding box detection losses like Faster R-CNN ● A new loss term for mask prediction ● Output: C x m x m volume for mask prediction (C classes, m size of square mask) 31 He et al, Mask R-CNN, 2017
  • 32. Instance segmentation: Mask R-CNN ● Masks are combined with classifications and bounding boxes from Faster R-CNN 32 He et al, Mask R-CNN, 2017
  • 33. Instance segmentation: Mask R-CNN ● Results on COCO dataset ● MNC and FCIS were winners of COCO 2015 and 2016 33 MNC: Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015 FCIS: Li et al., “Fully convolutional instance-aware semantic segmentation”, CVPR 2017
  • 34. Summary ● Semantic segmentation ○ Fully convolutional networks ○ Learnable Upsampling ○ Skip connections ○ Models: ■ FCN8s, Dilated Convolutions, U-Net, FC Densenets ● Instance segmentation ○ Based on object/segments proposals ■ Simultaneous Detection and Segmentation (R-CNN) ■ Multii-task cascade (Faster R-CNN) ■ Mask R-CNN (Faster R-CNN) ○ Others ■ Recurrent instance segmentation 34
  • 36. FCN-8s vs DeepLab vs Dilated Convolutions 36 Input image FCN-8s DeepLab DilConv Ground truth Pascal VOC 2012 test set Mean IoU: FCN-8s = 62.2 DeepLab = 62.1 DilConv = 67.6 FC8: Fully Convolutional Networks for Semantic Segmentation, Long, Darrell, Shelhamer, 2014-2016 DeepLab: Semantic Image Segm. with Deep Conv. Nets, Atrous Convolution, and Fully Connected CRFs, Chen, et al, 2015 Dilated convolutions: Multi-scale context aggregation by dilated convolutions, Yu, Koltun, 2016