10. 1. 提案手法 : Localization Layer
• キャプション生成の候補領域を生成
• 各領域は矩形で表現される (B x 4 のテンソル)
• 候補領域に対応する特徴 (B x C x X x Y) を、CNNで抽出した特徴
マップから切り出す
10
[Johnson+ CVPR’16]
39. 参考文献
• J. Johnson et al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning.
CVPR, 2016.
• O. Vinyals et al. Show and Tell: A Neural Image Caption Generator. CVPR, 2015.
• K. Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.
ICML, 2015.
• S. Ren et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal
Networks. NIPS, 2015.
• P. Anderson et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual
Question Answering. CVPR, 2018.
• T. Yao et al. Exploring Visual Relationship for Image Captioning. ECCV, 2018.
• R. Krishna et al. Visual genome: Connecting language and vision using crowdsourced dense
image annotations. 2016.
• L. Yang et al. Dense Captioning with Joint Inference and Visual Context. CVPR, 2017.
• G. Yin et al. Context and Attribute Grounded Dense Captioning. CVPR, 2019.
• D-J Kim et al. Dense Relational Captioning: Triple-Stream Networks for Relationship-Based
Captioning. CVPR, 2019.
39