This document presents a proposal for a project on video saliency prediction using deep neural networks. The objectives are to understand state-of-the-art saliency models, set a baseline model on the DHF1K dataset using SalGAN, and explore using complementary modalities like time dynamics as input to SalGAN. Experiments include checking evaluation metrics, setting a Pytorch SalGAN baseline on SALICON, fine-tuning the baseline on DHF1K, and adding extra inputs like depth and coordinates which improve performance. Conclusions discuss the project environment and code, state-of-the-art model performance, and boosting the baseline model on DHF1K video saliency prediction. Future work proposes exploring LSTM,
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
1. Video Saliency Prediction
with Deep Neural Networks
Author:
Juan José Nieto
Advisors:
Eva Mohedano
Xavier Giró-i-Nieto
Kevin McGuiness
Barcelona, 5 February 2019
1
2. INDEX
1. Saliency Prediction
2. Objectives
3. Datasets and models
4. Proposal
5. Environment
6. Experiments
7. Conclusions
8. Future development
2
7. ● Understand what saliency
model is and how do they
work. Study
state-of-the-art.
● Set a baseline model
based on SalGAN on the
DHF1K.
● Explore complementary
modalities to explicitly
model time dynamics as
an input for SalGAN.
Objectives
7
9. SalGAN
Image source and paper: Pan, J., Ferrer, C.C., McGuinness, K., O'Connor, N.E., Torres, J., Sayrol, E. and Giro-i-Nieto, X., 2017.
Salgan: Visual saliency prediction with generative adversarial networks. arXiv preprint arXiv:1701.01081.
9
10. SALICON
Image source: http://salicon.net/explore/
● Mouse-movement
● General and
task-free
● 10K TRAINING
● 5K VALIDATION
● 5K TEST
● Gaussian width
mask 24 pixels
Paper: Jiang, M., Huang, S., Duan, J. and Zhao, Q., 2015. Salicon: Saliency in context. In Proceedings of the IEEE conference on computer vision
and pattern recognition (pp. 1072-1080) 10
11. ACLNet
Image source and paper: Wang, W., Shen, J., Guo, F., Cheng, M.M. and Borji, A., 2018, January. Revisiting Video Saliency: A
Large-scale Benchmark and a New Model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp.
4894-4903). 11
12. DHF1k
Images source: DHF1K dataset.
● Eye-tracker
● General and task-free
● 600 VIDEOS TRAINING
● 100 VIDEOS VALIDATION
● 300 VIDEOS TEST
● Gaussian width mask 30 pixels
Paper: Wang, W., Shen, J., Guo, F., Cheng, M.M. and Borji, A., 2018, January. Revisiting Video Saliency: A Large-scale
Benchmark and a New Model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp.
4894-4903).
12
13. DeepVS
13
Image source and paper: Jiang, L., Xu, M. and Wang, Z., 2017. Predicting Video Saliency with
Object-to-Motion CNN and Two-layer Convolutional LSTM. arXiv preprint arXiv:1709.06316.
14. LEDOV
Images source: LEDOV dataset.
● Eye-tracker
● General and task-free
● 436 VIDEOS TRAINING
● 41 VIDEOS VALIDATION
● 41 VIDEOS TEST
● Gaussian width mask 40 pixels
14
24. Adding extra input signals
VALIDATION ACLNET
Train on
SALICON
Train on DHF1K
Baseline
RGB
fine-tuning
RGB and Depth
fine-tuning
RGB and
Coordconv
AUC_JUDD 0.89 0,872 0.880 0.895 0.866
AUC_SHUF 0.601 0,666 0.632 0.648 0.629
NSS 2.354 2,035 2.285 2.524 2.072
CC 0.434 0,379 0.420 0.463 0.389
SIM 0.315 0,267 0.339 0.351 0.304
24
25. Conclusions
● Good environment. The project code is
publicly available in
https://github.com/juanjo3ns/SalBCE.
● Study of state-of-the-art models. Rating
second in the public leaderboard with
SalGAN.
● Implementation of a pytorch version of
SalGAN with equivalent performance
fine-tuning in SALICON.
● Boost the performance of the baseline
pytorch model to predict saliency in videos.
The baseline model is fine-tuned on the
DHF1k dataset by using RGB information,
RGB + Depth, and RGB + coordinates
information.
25
26. Future Work
● LSTM
● Optical flow
● Combine with depth and coordconv in different streams
26