Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Improving region based CNN object detector using bayesian optimization

373 vues

Publié le

A review of state-of-the-art region based CNN object detectors and how they can be improved using bayesian optimization.

Publié dans : Données & analyses
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Improving region based CNN object detector using bayesian optimization

  1. 1. Improving Region based CNN object detector using Bayesian Optimization AMGAD MUHAMMAD
  2. 2. Agenda • Background • Problem definition • Proposed solution • Baseline with an example
  3. 3. Background
  4. 4. Background: Deformable Parts Model • Strong low-level features based on histograms of oriented gradients (HOG) • Efficient matching algorithms for deformable part- based models (pictorial structures) • Discriminative learning with latent variables (latent SVM) • Where to look? Every where (the sliding window approach) • mean Average Precision (mAP): 33.7% - 33.4% P.F. Felzenszwalb et al., “Object Detection with Discriminatively Trained Part-Based Models”, PAMI 2010. J.J. Lim et al., “Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection”, CVPR 2013. X. Ren et al., “Histograms of Sparse Codes for Object Detection”, CVPR 2013.
  5. 5. Background: Selective search • Alternative to exhaustive search with sliding window. • Starting with over-segmentation, merge similar regions and produce region proposals. van de Sande et al., “Segmentation as Selective Search for Object Recognition”, ICCV 2011.
  6. 6. Deep Learning happened, again! Krizhevsky et al., “ImageNet Classification with Deep Convolutional Neural Networks”, NIPS 2012. ImageNet 2012 :whole-image classification with 1000 categories Model Top-1(val) Top-5(val) Top-5(test) 1 CNN 40.7% 18.2% - 5 CNNs 38.1% 16.4% 16.4% 1 CNN (pre-trained) 39.0% 16.6% - 7 CNNs (pre-trained) 36.7% 15.4% 15.3% • Can it be used in object recognition? • Problems: • localization: Where is the object? • annotation: Labeled data is scarce. • Expensive Computation for dense search.
  7. 7. R-CNN: Region proposals + CNN localization featureextraction classification Approach Summery selective search deep learning CNN binary linear SVM
  8. 8. R-CNN Input image Girshick et al. CVPR14.
  9. 9. Regions of Interest (RoI) from a proposal method (~2k) Input image R-CNN Girshick et al. CVPR14.
  10. 10. Warped image regions Regions of Interest (RoI) from a proposal method (~2k) Input image R-CNN Girshick et al. CVPR14.
  11. 11. ConvNet ConvNet ConvNet Warped image regions Forward each region through ConvNet Regions of Interest (RoI) from a proposal method (~2k) Input image R-CNN Girshick et al. CVPR14.
  12. 12. ConvNet ConvNet ConvNet SVMs SVMs SVMs Warped image regions Forward each region through ConvNet Classify regions withSVMs Regions of Interest (RoI) from a proposal method (~2k) Input image R-CNN Girshick et al. CVPR14.
  13. 13. ConvNet ConvNet ConvNet SVMs Warped image regions Forward each region through ConvNet Bbox reg Bbox reg Bbox reg SVMs SVMs Apply boundingboxregressors Classify regions withSVMs Regions of Interest (RoI) from a proposal method (~2k) Input image R-CNN Girshick et al. CVPR14.
  14. 14. What’s wrong with R-CNN?
  15. 15. • Ad hoc training objectives • Fine-tune network with softmax classifier (log loss) • Train post-hoc linear SVMs (hingeloss) • Train post-hoc bounding-box regressors (squaredloss) What’s wrong with R-CNN?
  16. 16. • Ad hoc training objectives • FineHtunenetwork with softmax classifier (log loss) • Train postHhoclinear SVMs (hingeloss) • Train postHhocboundingHbox regressors (squaredloss) • Training is slow (84h), takes a lot of disk space What’s wrong with R-CNN?
  17. 17. • Ad hoc training objectives • FineHtune network with softmax classifier (log loss) • Train postHhoclinear SVMs (hingeloss) • Train postHhocboundingHboxregressions (least squares) • Training is slow (84h), takes a lot of disk space • Inference (detection) is slow • 47s / image with VGG16 [Simonyan & Zisserman. ICLR15] • Fixed by SPP-net[He et al. ECCV14] ~2000 ConvNet forward passes per image What’s wrong with R-CNN?
  18. 18. SPP-net Input image He et al. ECCV14.
  19. 19. ConvNet Input image “conv5” feature map of image Forward whole image through ConvNet SPP-net He et al. ECCV14.
  20. 20. ConvNet Input image Forward whole image through ConvNet “conv5” feature map of imageRegions of Interest (RoIs) from a proposal method SPP-net He et al. ECCV14.
  21. 21. ConvNet Input image Forward whole image through ConvNet “conv5” feature map of imageRegions of Interest (RoIs) from a proposal method Spatial Pyramid Pooling (SPP) layer SPP-net He et al. ECCV14.
  22. 22. Input image Regions of Interest (RoIs) from a proposal method ConvNet SVMs Classify regions withSVMs FullyHconnected layers Spatial Pyramid Pooling (SPP) layer “conv5” feature map of image Forward whole image through ConvNet FCs SPP-net He et al. ECCV14.
  23. 23. Input image Regions of Interest (RoIs) from a proposal method ConvNet SVMs Classify regions withSVMs FullyHconnected layers Spatial Pyramid Pooling (SPP) layer “conv5” feature map of image Forward whole image through ConvNet FCs Bbox reg Apply boundingbox regressorsSPP-net He et al. ECCV14.
  24. 24. What’s good about SPP-net? • Fixes one issue with R-CNN:makes testing fast ConvNet SVMs FCs Bbox reg Region-wise computation Image-wise computation (shared)
  25. 25. What’s wrong with SPP-net? • Inherits the rest of R-CNN’sproblems • Ad hoc trainingobjectives • Training is slow (25h), takes a lot of disk space • Introduces a new problem: cannot update parameters below SPP layer during training
  26. 26. SPP-net: the main limitation ConvNet He et al. ECCV14. SVMs Trainable (3 layers) Frozen (13 layers) FCs Bbox reg SPPisnotdifferentiable
  27. 27. Fast R-CNN • Fast test-time,like SPP-net
  28. 28. Fast R-CNN • Fast test-time,like SPP-net • One network, trained in one stage
  29. 29. Fast R-CNN • Fast test-time,like SPP-net • One network, trained in one stage • Higher mean average precision than R-CNN and SPP-net
  30. 30. Fast R-CNN (test time) ConvNet Forward whole image through ConvNet “conv5” feature map of imageRegions of Interest (RoIs) from a proposal method Input image
  31. 31. ConvNet Forward whole image through ConvNet “conv5” feature map of image “RoI Pooling” (singleHlevel SPP) layer Input image Regions of Interest (RoIs) from a proposal method Fast R-CNN (test time)
  32. 32. Linear + softmax FCs FullyHconnected layers “RoI Pooling” (singleHlevel SPP) layer “conv5” feature map of image Forward whole image through ConvNet Input image Softmax classifier Regions of Interest (RoIs) from a proposal method ConvNet Fast R-CNN (test time)
  33. 33. ConvNet Forward whole image through ConvNet “conv5” feature map of image “RoI Pooling” (single-level SPP) layer Linear + softmax FCs FullyHconnected layers Softmax classifier Regions of Interest (RoIs) from a proposal method Linear Input image Bounding-box regressors Fast R-CNN (test time)
  34. 34. Fast R-CNN (training) Linear + softmax FCs Linear ConvNet
  35. 35. Log loss + smooth L1 loss Linear + softmax FCs Linear ConvNet Multi-taskloss Fast R-CNN (training)
  36. 36. Log loss + smooth L1 loss Linear + softmax FCs Linear Trainable Multi-taskloss ConvNet Fast R-CNN (training)
  37. 37. What is missing from the previous architectures? • All the previous architectures relies on an external region proposal algorithm. • Proposed regions are independent from the network loss. • No control over the regions quality.
  38. 38. • Fast test-time,like FastR-CNN Faster R-CNN
  39. 39. Faster R-CNN • Fast test-time,like FastR-CNN • One network, trained in one stage
  40. 40. • Fast test-time,like FastR-CNN • One network, trained in one stage • Higher mean average precision than R-CNN,SPP-net, Fast-RCNN Faster R-CNN
  41. 41. • Fast test-time,like FastR-CNN • One network, trained in one stage • Higher mean average precision than R-CNN , SPP- net, Fast-RCNN • HaveadedicatedRegionProposalNetwork(RPN)trainedto optimizethenetworkloss. Faster R-CNN
  42. 42. ConvNet Forward whole image through ConvNet Input image Faster R-CNN
  43. 43. ConvNet Forward whole image through ConvNet Input image Forward whole image through RPN ConNet Faster R-CNN ConvNet
  44. 44. ConvNet Forward whole image through ConvNet Input image Linear + softmax Linear Faster R-CNN Forward whole image through RPN ConNet ConvNet
  45. 45. ConvNet Forward whole image through ConvNet Input image Linear + softmax Softmax classifier Linear Bounding-box regressors Faster R-CNN Forward whole image through RPN ConNet ConvNet
  46. 46. ConvNet Forward whole image through ConvNet Input image “conv5” feature map of image Linear + softmax Softmax classifier Linear Bounding-box regressors Faster R-CNN Forward whole image through RPN ConNet ConvNet
  47. 47. ConvNet Forward whole image through ConvNet Input image “conv5” feature map of image “RoI Pooling” (single-level SPP) layer FCs FullyHconnected layers Linear + softmax Softmax classifier Linear Bounding-box regressors Faster R-CNN Forward whole image through RPN ConNet ConvNet
  48. 48. ConvNet Forward whole image through ConvNet Input image “conv5” feature map of image “RoI Pooling” (single-level SPP) layer Linear + softmax FCs FullyHconnected layers Softmax classifier Linear Bounding-box regressors Linear + softmax Softmax classifier Linear Bounding-box regressors Faster R-CNN Forward whole image through RPN ConNet ConvNet
  49. 49. ConvNet Linear + softmax FCs Linear Linear + softmax Linear Faster R-CNN Trainable ConvNet Super efficient: shared weightsbetween detection andRegion Proposal network Trainable
  50. 50. Problem definition
  51. 51. Problem definition • All region based CNN object detector are dependent on the quality of the region proposal algorithm. • Although in the Faster R-CNN, the region proposal network was trained to minimize a multi-task loss function (log-loss and bounding-box regression), still ,in my experiments, the best proposed regions are ill- localized.
  52. 52. Problem definition (example) Top 1 region
  53. 53. Problem definition (example) Top 1 region Top 3 regions
  54. 54. Problem definition (example) Top 1 region Top 3 regions Top 5 regions
  55. 55. Problem definition (example) Top 1 region Top 3 regions Top 5 regions Top 100 regions
  56. 56. Proposed Solution
  57. 57. Better regions with Bayesian Optimization Now the goal becomes sampling new solution 𝑦 𝑛+1 with high chance that it will maximizes the value of 𝑓𝑛+1
  58. 58. Better regions with Bayesian Optimization Given the ability to query a our CNN for region scores we can repeat the following:
  59. 59. 1. Given existing regions/scores • Better regions with Bayesian Optimization Given the ability to query a our CNN for region scores we can repeat the following:
  60. 60. 1. Given existing regions/scores • 2. Wefit a model Given the ability to query a our CNN for region scores we can repeat the following: Better regions with Bayesian Optimization
  61. 61. 1. Given existing regions/scores • 2. Wefit a model 3. Introduce the chanceutility function Given the ability to query a our CNN for region scores we can repeat the following: Better regions with Bayesian Optimization
  62. 62. 1. Given existing regions/scores • 2. Wefit a model 3. Introduce the chanceutility function 4. Locatethe maximum of the utility Given the ability to query a our CNN for region scores we can repeat the following: Better regions with Bayesian Optimization
  63. 63. 1. Given existing regions/scores • 2. Wefit a model 3. Introduce the chanceutility function 4. Locatethe maximum of the utility 5. Observe the new regionscore Given the ability to query a our CNN for region scores we can repeat the following: Better regions with Bayesian Optimization
  64. 64. 1. Given existing regions/scores • 2. Wefit a model 3. Introduce the chanceutility function 4. Locatethe maximum of the utility 5. Observe the new regionscore 6. Update the model. Given the ability to query a our CNN for region scores we can repeat the following: Better regions with Bayesian Optimization
  65. 65. 1. Given existing regions/scores • 2. Wefit a model 3. Introduce the chanceutility function 4. Locatethe maximum of the utility 5. Observe the new regionscore 6. Update the model. 7. Repeatstep 2. Given the ability to query a our CNN for region scores we can repeat the following: Better regions with Bayesian Optimization
  66. 66. 1. Given existing regions/scores • 2. Wefit a model 3. Introduce the chanceutility function 4. Locatethe maximum of the utility 5. Observe the new regionscore 6. Update the model. 7. Repeatstep 2. Given the ability to query a our CNN for region scores we can repeat the following: Better regions with Bayesian Optimization
  67. 67. Example of BO applied to R-CNN Yuting Zhang, Kihyuk Sohn, Ruben Villegas, Gang Pan, and Honglak Lee.
  68. 68. Originalimage Yuting Zhang, Kihyuk Sohn, Ruben Villegas, Gang Pan, and Honglak Lee.
  69. 69. Initial regionproposals
  70. 70. Initial detection(localoptima)
  71. 71. Initialdetection&Groundtruth Neither gives good localization
  72. 72. Iter1:Boxesinsidethelocalsearchregion
  73. 73. Iter1:Heat mapofexpectedimprovement(EI) • A box has 4Ncoordinates: (centerX, centerY, height,width) • The height and widthare marginN alized by max to visualize EI in2D
  74. 74. Iter1:Heat mapofexpectedimprovement(EI)
  75. 75. Iter1:Maximum ofEI–thenewlyproposedbox
  76. 76. Iter 1:Complete
  77. 77. Iteration 2: local optimum &searchregion
  78. 78. Iteration2:EIheat map&newproposal
  79. 79. Iteration2:Newlyproposedbox& itsactual score
  80. 80. Iteration 3: local optimum &searchregion
  81. 81. Iteration3:EIheatmap & newproposal
  82. 82. Iteration3:Newlyproposedbox& itsactual score
  83. 83. Iteration4
  84. 84. Iteration5
  85. 85. Iteration6
  86. 86. Iteration7
  87. 87. Iteration8
  88. 88. Finalresults
  89. 89. Final results &Ground truth
  90. 90. Baseline
  91. 91. Questions

×