video understanding deep learning temporal action localization machine learning transformer object detection tricks conference paper figures multi-modal training computer vision temporal action detection hybrid learning action localization
Tout plus