cvpr scene graph self-supervised learning fair transformer video action transformer network transformer in vision action recognition cvpr 2019 missing modality multimodal learning vision transformer video transformer mãe masked autoencoder iccv vcr visual commonsense reasoning moment retrieval video grounding multimodal action recongnition dataset cvpr 2021 google research openai unit align dall-e clip nips icml iclr efficient transformers réformer big bird transformers are rnns performer an image is worth 16x16 words end-to-end object detection with transformers image transformer cvpr 2020 scene graph generation visual relationship detection graph convolutional network graph r-cnn eccv 2018
Tout plus