Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Inspiration

31 vues

Publié le

Presentation by Kofi Appiah showing how TULIPP is used at Sheffield Hallam University

Publié dans : Périphériques & matériel
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Inspiration

  1. 1. Real-Time Modelling Visual Scenes with Biological Inspiration Kofi Appiah Sheffield Hallam University
  2. 2. AI now and before • Computer Vision and natural language processing have improved significantly over the past 10 years. • Image recognition and classification systems • Apple photo organiser, Facebook face recognition. • Robot use in warehouse • Amazon warehouse robots (https://www.youtube.com/watch?v=4sEVX4mPuto) • Medical image analysis for healthcare • non-invasive diagnosis • Agriculture, sport, manufacturing, autonomous cars technology. • Crop yield, goal-line technology, defective products, people detection. Human level face recognition Taigman et. al. CVPR2014
  3. 3. Why AI acceleration • Better algorithms that learn from examples not predefined rules • Deep learning • Neural networks • Machine perception • Availability of data – Big Data • Internet images, YouTube videos, Facebook images • High Performance Computing • Field Programmable Gate Arrays (FPGAs) • Graphics Processing Units (GPU) IEEE Spectrum
  4. 4. Key Achievements • Visual recognition with high accuracies. • 3D reconstruction of an environment Mask R-CNN He et. al. ICCV2017 Litjens et. al. 2017 Johnson et. al. CVPR2015 Driverless cars - Mathworks Faster R-CNN TPAMI 2017
  5. 5. Where things fall apart • March 18, 2018, Uber’s autonomous car hit and killed 49-year-old as she was walking her bike across the street. • https://www.youtube.com/watch?v=7iTshCm41Ko • Novel and imperfect system • March 23 2018, autopilot Tesla slammed into concrete killing driver. • Security robots attacking a kid in a shopping area, July 2016. • Robot failure to open different doors – which training mode. • Reinforced learning • Supervised or Unsupervised?
  6. 6. Why things go wrong • For autonomous cars, the state of the art is good and providing bounding boxes of objects in the scene. • What is missing is an interpretation of the scene. • No contextual reasoning. • Robot navigation • Decision making might be optimal but not feasible or safe. • Modelling in a crowded scene to infer interaction • Modelling very unusual situations with little or no data • Things that human are capable of, e.g. dealing with complex scenes Fei-Fei Li
  7. 7. Unsupervised Background Subtraction • Image Segmentation separate moving objects from the background. • Background subtraction is a practical approach when the image sensor is stationary. • Background Modelling techniques - Unimodal - Multimodal
  8. 8. W4 and Grimson’s Algorithm – 2000s • Requires manual initialization of the Maximum (M), Minimum (m) & inter-frame difference (D) • Pixel x of image I is foreground if |m(x)-It(x)|>D(x) or |M(x)- It(x)|>D(x) • Detection, Motion & change history maps used for outdoor scene. • Use of fixed-point update values. • Bimodal can’t model problems like moving foliage and lighting changes. • Mixture of Gaussians with associated weights to model each pixel. • Parameters are updated as follows: • The first B distributions, ordered by weight represents the background • Robust in modeling multimodal background. • Suffers from blending effect and uses floating point in all updates
  9. 9. Efficient Hardware Implementation • Maintains K clusters each with weight wk, central value ck and implied global range [ck-15, ck+15] • Weights and central values of all clusters are initialized 0, and updated as follows: • Uses both pixel and frame-level processing • The first B distributions, ordered by weight represents the background       + = − − otherwise 64 63 clustermatchingfor the 64 1 64 63 1, 1, , tk tk tk         + = − − otherwise clustermatching 8 1 8 7 ,,1, ,,,1, ,,, jitk jijitk jitk c Xc c       = = b k ib TB 1 minarg  Appiah et al FPT 2005
  10. 10. TULIPP – The game changer! • Tools to help real-time computer vision developer to focus on: • core application development by automating recurring, but critical, tasks such as performance instrumentation • Design space exploration and • Vendor tool configuration. • Making it possible for the designer to get the required performance in speed, coupled with power constraints without having to worry too much about the architecture.
  11. 11. Imaging before Deep Learning Before • Standard feature detectors • SIFT, HOG, LBP • Different algorithms for object detection • Requires small amount of data • Useful for measurement and labelling After • Featured are learnt and stacked according to data • Same algorithm that adapts to the data • Requires huge volume of data • Useful for labelling MathworksDalal & Triggs cc.gatech.edu
  12. 12. Deep CNN – Overview • Uses convolution to preserve the spatial structure of the input image • Instead of a sigmoid activation function, ReLU (rectified linear unit) is often used • Encourages sparsity of synapses as the value approaches zero (0). Credit : Fei-Fei Li CS231n; Bala Amavasai – IEEE & M. Turner
  13. 13. Feature Maps - Several feature maps are used to identify various local features • Several feature maps are used to identify various local features. • Each convolution filter can be tuned to edges of different • Orientation, Frequency, Phase, Colour, etc • Capture some aspects of neural response • But neural data not used in training
  14. 14. Sparse local connectivity • For an input image of size 7x7 • The convolution filter 3x3 • The output image will be 5x5 • (Image – Filter )/stride + 1 • A sample filter for horizontal and vertical gradient.
  15. 15. Way forward • Computer Vision meets Cognitive Science and Neuroscience Fei-Fei Li & Justin Johnson & Serena Yeung
  16. 16. The Challenge • The success stories about the rise of Convolutional Neural Networks (CNNs) capable of learning high-level features in object recognition keeps increasing • due to the availability of large datasets like ImageNet • However, performance at scene recognition has not attained the same level of success. • Yet large scene databases like SUN and Places do exist • Maybe the current deep features trained from ImageNet are not competitive enough for such tasks. • But do primates and humans actually do a raster scan to understand a scene? • CNNs fail to capture insensitivity to perturbations of an image
  17. 17. Possible Solution • Performance accuracies in CNNs relies on a huge search space. • The need for more biological guidance from the visual cortex • Multi-disciplinary research in neuroscience, psychology, physiology, shows that: • object recognition in visual cortex is modulated via the ventral stream • Neuronal signals from the retina are transformed into high-level representation for object recognition. • Computer Scientist working with neuroscientist, psychologist, etc. would have better models for understanding scenes.
  18. 18. Reported Successes • A biologically Inspired Deep CNN Model [Zhang et al. 2016] • Simulates the V1, V2, V4 and IT layers of the human ventral stream • Uses convolutional layers with varied sizes and complexities • Increased concurrency for improved processing speed • Outperformed seven other CNN techniques using four datasets. • You Only Look Once (YOLOv2) [Redmon and Farhadi CVPR2017] • Based on the assumption that humans glance at an image • Does not rely on sliding window like other deep learning approaches • Outperforms Deformable Part Models (DPM) and Regional CNN.
  19. 19. Scene understanding with DNN • Learning Deep Features for Scene Recognition using Places Database [Zhou et al. NIPS2014] • Uses CNN to learn features from the scene • Combined various local and global features to understand the scene • Presents scene categories where machines perform like humans. • Humans, but Not Deep Neural Networks, Often Miss Giant Targets in Scenes [Eckstein et al. Current Biology 2017] • Humans often miss unusual sized targets during visual search • Deep learning does not exhibit such deficit with targets • Is that a good thing or not?
  20. 20. Our motivation • Missing giant targets is a functional brain strategy to discount distractors Eckstein et al. Current Biology 2017
  21. 21. Our Approach • To understand how humans and primates recognise scenes • Provide them with samples of indoor scenes • Ask them to identify specific objects • Observe their recall mechanism, if spatial relationship plays a role • Model the scene to account for the experimental results • Incorporate global and local descriptors • Construct a relationship vector Lunchroom image : PASSTA Dataset
  22. 22. Summary • Computer vision and machine learning have improved over the years, thanks to more data and processing power. • Global scene understanding is still a challenge. • Multi-disciplinary effort required to take computer vision to the next level, acceptable for applications like driverless cars. • We aim to combine positives of CNN with what humans are good at for scene understanding. • TULIPP offers the platform with toolchain to drive this agenda.