The goal of this work is segmenting on a video sequence the objects which are mentioned in a linguistic description of the scene. We have adapted an existing deep neural network that achieves state of the art performance in semi-supervised video object segmentation, to add a linguistic branch that would generate an attention map over the video frames, making the segmentation of the objects temporally consistent along the sequence.
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Video Object Linguistic Grounding
1. Video Object
Linguistic GroundingWorkshop on
Multimodal
Understanding &
Learning for
Embodied
Applications
(MULEA)
Nice, France
25 October 2019
Carles
Ventura
Alba M.
Herrera
Xavier
Giro-i-Nieto
3. #RVOS Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques and Xavier Giro-i-Nieto. “RVOS:
End-to-End Recurrent Network for Video Object Segmentation”, CVPR 2019.
4. 4
From Masks to Referring Expressions
Model
time
Model
One-shot RVOS [*]
Model
time
Model
Referring expression
“the woman”
[*] #RVOS Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques and Xavier Giro-i-Nieto. “RVOS:
End-to-End Recurrent Network for Video Object Segmentation”, CVPR 2019.
5. 5
Related Work
Khoreva, A., Rohrbach, A., & Schiele, B. Video object segmentation with language referring expressions. ACCV 2018.
6. 6
Image Segmentation with Refers
#MAttNet Yu, L., Lin, Z., Shen, X., Yang, J., Lu, X., Bansal, M., & Berg, T. L. . Mattnet: Modular attention network for referring
expression comprehension. CVPR 2018