SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Action Recognition
September 3, 2018
Katsunori Ohnishi
DeNA Co., Ltd.
1
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n
n Action recognition
n
n
n
Deep
Deep
Temporal Aggregation
n Tips
n
n
2
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n ( )
Twitter: @ohnishi_ka
n
2014 4 -2017 9 : B4~M2.5 Computer Vision
• ( ) : http://katsunoriohnishi.github.io/
CVPR2016 (spotlight oral, acceptance rate=9.7%): egocentric vision (wrist-mounted camera)
ACMMM2016 (poster, acceptance rate=30%): action recognition ( state-of-the-art)
AAAI2018 (oral, acceptance rate=10.9%): video generation (FTGAN)
2017 10 - : DeNA AI
• DeNA
→ https://www.wantedly.com/projects/209980
3
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Action Recognition
n
Image classification
action recognition = human action recognition
• fine-grained egocentric
4
Fine-grained
egocentric
Dog-centric
Action recognition
RGBD
Evaluation of video activity localizations integrating quality and quantity measurements [C. Wolf+, CVIU14]
Recognizing Activities of Daily Living with a Wrist-mounted Camera [K. Ohnishi+, CVPR16]
A Database for Fine Grained Activity Detection of Cooking Activities [M. Rohrbach+, CVPR12]
First-Person Animal Activity Recognition from Egocentric Videos [Y. Iwashita+, ICPR14]
Recognizing Human Actions: A Local SVM Approach [C. Schuldt+, ICPR04]
HMDB: A Large Video Database for Human Motion Recognition [H. Kuehne+, ICCV11]
Ucf101: A dataset of 101 human actions classes from videos in the wild [K. Soomro+, arXiv2012]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n
KTH, UCF101, HMDB51
• UCF101 101 13320 …
n
Activity-net, Kinetics, Youtube8M
n
AVA, Moments in times, SLAC
5
UCF101
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n YouTube-8M Video Understanding
Challenge
https://www.kaggle.com/c/youtube8m
CVPR17 ECCV18 workshop ,
Kaggle
frame-level
test
• kaggle , action recognition
n ActivityNet Challenge
http://activity-net.org/challenges/2018/
ActivityNet 3
• Temporal Proposal (T )
• Temporal localization (T )
• Video Captioning
• Kinetics: classification (human action)
• AVA: Spatio-temporal localization (XYT)
• Moments-in-time: classification (event)
6
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN
n
2000
SIFT
local descriptor→coding global feature→
n
STIP [I. Laptev, IJCV04]
Dense Trajectory [H. Wang+, ICCV11]
Improved Dense Trajectory [H. Wang+, ICCV13]
7
•
http://hirokatsukataoka.net/temp/presen/170121STAIRLab_slideshar
e.pdf
•
https://arxiv.org/pdf/1605.04988.pdf
On space-time interest points [I. Laptev, IJCV04]
Action Recognition by Dense Trajectories [H. Wang+, ICCV11]
Action Recognition with Improved Trajectories [H. Wang+, ICCV13]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN
n Improved Dense Trajectories (iDT) [H. Wang+, ICCV13]
Dense Trajectories [H. Wang+, ICCV11]
8
2
optical flow
foreground
optical flow
Improved dense trajectories (green)
(background dense trajectories (white))
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN
n
9
SIFT Fisher Vector
Fisher vector
http://www.isi.imi.i.u-tokyo.ac.jp/~harada/pdf/SSII_harada20120608.pdf
https://www.slideshare.net/takao-y/fisher-vector
…
input Local descriptor
iDT
Video descriptor
Fisher Vector
[F. Perronnin+, CVPR07]
Classifier
SVM
Fisher kernels on visual vocabularies for image categorization [F. Perronnin, CVPR07]
[F. Pedregosa+, JMLR11]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition
n
CNN
Two-stream
• Hand-crafted feature ( )
3D Convolution
• C3D
• C3D Two-stream
• 3D conv
Optical flow
10
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: CNN
n Spatio-temporal ConvNet [A. Karpathy+, CVPR 14]
CNN
AlexNet RGB ch → 10 frames ch (gray)
multi scale Fusion
Sports1M pre-training UCF101 65.4 (iDT 85.9%)
11
Large-scale video classification with convolutional neural network [A. Karpathy+, CVPR14]
• 10 frames conv1 ch
• RGB gray frame-by-frame
score ( )
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: Two-stream
n Two-stream [K. Simonyan+, NIPS15]
2D CNN* ,
• Spatial-stream: RGB (input: RGB)
• Temporal-stream: Optical flow (input: optical flow 10 frames)
• Frame-by-frame
Hand-crafted feature CNN
12
Two-stream convolutional networks for action recognition in videos [K. Simonyan+, NIPS15]
UCF101 HMDB51
iDT 85.9% 57.2%
Spatio-temporal ConvNet 65.4% -
RGB-stream 73.0% 40.5%
Flow-stream 83.7% 54.6%
Two-steam 88.0% 59.4%
• ( )
• 2DCNN
*imagenet pre-trained
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n C3D [D. Tran +, ICCV15]
16frame 3D convolution CNN
• XYT 3D convolution
UCF101 pre-training
ICCV15 arxiv 2 reject
13
Learning Spatiotemporal Features with 3D Convolutional Networks [D. Tran +, ICCV15]
UCF101 HMDB51
iDT 85.9% 57.2%
Two-steam 88.0% 59.4%
C3D (1net) 82.3% -
3D conv
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n P3D [Z. Qiu+, ICCV17]
C3D ,
3D conv → 2D conv (XY) + 1D conv (T)
pre-training
14
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks [Z. Qiu+, ICCV17]
UCF101 HMDB51
iDT 85.9% 57.2%
Two-steam (Alexnet) 88.0% 59.4%
P3D (ResNet) 88.6% -
Spatial 2D conv
Temporal 1D conv
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n P3D [Z. Qiu+, ICCV17]
C3D ,
3D conv → 2D conv (XY) + 1D conv (T)
pre-training
15
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks [Z. Qiu+, ICCV17]
UCF101 HMDB51
iDT 85.9% 57.2%
Two-steam (Alexnet) 88.0% 59.4%
P3D (ResNet) 88.6% -
Two-stream (ResNet152) 91.8%Spatial 2D conv
Temporal 1D conv
3D conv
again
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n C3D, P3D
3D conv
n
3D conv [K. Hara+, CVPR18]
16
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? [K. Hara+, CVPR18]
2012 2011 2015 2017
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n C3D, P3D
3D conv
n
3D conv [K. Hara+, CVPR18]
17
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? [K. Hara+, CVPR18]
2012 2011 2015 20172017
Kinetics!
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n Kinetics
human action dataset!
3D conv
• Pre-train UCF101
18
The Kinetics human action video dataset [W. Kay+, arXiv17]
• Youtube8M
•
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n I3D [J. Carreira +, ICCV17]
Kinetics dataset DeepMind
3D conv Inception
64 GPUs for training, 16 GPUs for predict
state-of-the-art
• RGB
• Two-stream optical flow
score
19
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J. Carreira +, ICCV17]
UCF101 HMDB51
RGB-I3D 95.6% 74.8%
Flow-I3D 96.7% 77.1%
Two-stream I3D 98.0% 80.7%
…
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n I3D [J. Carreira +, ICCV17]
Kinetics dataset DeepMind
3D conv Inception
64 GPUs for training, 16 GPUs for predict
state-of-the-art
• RGB
• Two-stream optical flow
score
20
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J. Carreira +, ICCV17]
UCF101 HMDB51
RGB-I3D 95.6% 74.8%
Flow-I3D 96.7% 77.1%
Two-stream I3D 98.0% 80.7%
…
?
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n I3D Two-stream
3D convolution
n ( )
3D conv XY T
• XY T
3D conv
21
time
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n 3D convolution [D.A. Huang+, CVPR18]
• 3D CNN
• →
•
• Two-stream I3D Optical flow 3D conv
22
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets [D.A. Huang+, CVPR18]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n 3D conv
CVPR18
CVPR/ICCV/ECCV
3D conv 3D
conv
• GPU
23
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: Optical flow
n Optical flow [L Sevilla-Lara+, CVPR18]
• Optical flow
• Optical flow (EPE) action recognition
• flow action recognition
•
Optical flow appearance
• Optical flow
24
On the Integration of Optical Flow and Action Recognition [L Sevilla-Lara+, CVPR18]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
25
AVA
XYZT bounding box
human action localization
Moments-in-time
3
Kinetics-600
Kinetics 400 600
[C. Gu+, CVPR18] [M. Monfort+, arXiv2018] [W. Kay+, arXiv2017]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Temporal Aggregation
n
2D conv frame-by-frame 3D conv
(100 frames, 232 frames, 50 frames)
26
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Temporal Aggregation
n
Score
→
LSTM
→
• FC
?
• fencing → fencing
→…
27
…
…
CNN
LSTM
FC
CNN
LSTM
FC
CNN
LSTM
FC
CVPR ACMMM AAAI
…
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
…
input Local descriptor
iDT
Video descriptor
Fisher Vector
[F. Perronnin+, CVPR07]
Classifier
SVM
[F. Pedregosa+, JMLR11]
Temporal Aggregation
n ,
→ …!
Fisher Vector
• CNN SIFT GMM
• FV VLAD [H. Jegou+, CVPR10]
28
Aggregating local descriptors into a compact image representation [H. Jegou+, CVPR10]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Temporal Aggregation
n LCD [Z. Xu+, CVPR15]
VGG16 pool5 XY 512dim feature
• 224x224 feature 7x7=49
• VLAD global feature
29
A discriminative CNN video representation for event detection [Z. Xu+, CVPR15]
…
input
CNN
Pool5
(e.g. 2x2x512)
Local descriptors
VLAD
SVM
global feature
CNN
CNN
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Temporal Aggregation
n ActionVLAD [R. Girdhar+, CVPR17]
NetVLAD [R Arandjelović+, CVPR16]
• NetVLAD VLAD NN Cluster assign softmax
assign
• VLAD LCD
VLAD
• End2end CNN !
30
ActionVLAD: Learning spatio-temporal aggregation for action classification [R. Girdhar+, CVPR17]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Temporal Aggregation
n TLE [A. Diba+, CVPR17]
VLAD Compact Bilinear Pooling [Y. Gao+, CVPR16]
Temporal Aggregation
VLAD
• SVM VLAD NN
31
Deep Temporal Linear Encoding Networks [A. Diba+, CVPR17]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Tips
n
Two-stream (ResNet) 2D conv Optical flow
n Single model State-of-the-art
I3D + TLE BA
64GPU
n
Two-stream optical flow GPU
• optical flow stream
• RGB-stream
Optical flow
32
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Tips
n
CNN TLE coding
• TLE ActionVLAD
iDT
• CNN
• FisherVector iDT
Tips: PCA (dim=64). K=256. FV power norm
• CPU
33
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Temporal Aggregation
n
Score
→
LSTM
→
• FC
?
• fencing → fencing
→…
34
…
…
CNN
LSTM
FC
CNN
LSTM
FC
CNN
LSTM
FC
CVPR ACMMM AAAI
…
input
↓
Two-stream
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n
LSTM
3D conv
Optical flow
•
[L Sevilla-Lara+, CVPR18]
35
…
…
CNN
LSTM
FC
CNN
LSTM
FC
CNN
LSTM
FC
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
2D conv + LSTM 3D conv 3D conv
Two-stream
Optical flow
MoCoGAN
[S. Tulyakov+, CVPR18]
VGAN
[C. Vondrick+, NIPS16]
TGAN
[M. Saito+, ICCV17]
FTGAN
[K. Ohnishi+, AAAI18]
LRCN
[J. Donahue+, CVPR15]
C3D
[D. Tran+, ICCV15]
P3D
[Z. Qiu+, ICCV17]
Two-stream [K. Simonyan+, NIPS15]
I3D [J. Carreira +, ICCV17]
( )VGAN
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
2D conv + LSTM 3D conv 3D conv
Two-stream
Optical flow
MoCoGAN
[S. Tulyakov+, CVPR18]
VGAN
[C. Vondrick+, NIPS16]
TGAN
[M. Saito+, ICCV17]
FTGAN
[K. Ohnishi+, AAAI18]
LRCN
[J. Donahue+, CVPR15]
C3D
[D. Tran+, ICCV15]
P3D
[Z. Qiu+, ICCV17]
Two-stream [K. Simonyan+, NIPS15]
I3D [J. Carreira +, ICCV17]
( )
!
VGAN
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n !
Hierarchical Video Generation from Orthogonal Information: Optical Flow and Texture
K. Ohnishi+, AAAI 2018 (oral presentation)
https://arxiv.org/abs/1711.09618
38
Optical flow
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n
Action classification
• Temporal action localization Spatio-temporal localization
3D conv
Augmentation
n Pose
Pose
• pose
• data distillation
n Tips
&optical flow
Kinetics Youtube
39
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n
XY XYT O(n2)→ O(n3)
• !
n
n
n
40

Contenu connexe

Tendances

ドメイン適応の原理と応用
ドメイン適応の原理と応用ドメイン適応の原理と応用
ドメイン適応の原理と応用Yoshitaka Ushiku
 
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法SSII
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII
 
【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose EstimationDeep Learning JP
 
Generative Models(メタサーベイ )
Generative Models(メタサーベイ )Generative Models(メタサーベイ )
Generative Models(メタサーベイ )cvpaper. challenge
 
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and EditingDeep Learning JP
 
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State SpacesDeep Learning JP
 
動画認識サーベイv1(メタサーベイ )
動画認識サーベイv1(メタサーベイ )動画認識サーベイv1(メタサーベイ )
動画認識サーベイv1(メタサーベイ )cvpaper. challenge
 
【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習cvpaper. challenge
 
【DL輪読会】Patches Are All You Need? (ConvMixer)
【DL輪読会】Patches Are All You Need? (ConvMixer)【DL輪読会】Patches Are All You Need? (ConvMixer)
【DL輪読会】Patches Are All You Need? (ConvMixer)Deep Learning JP
 
三次元表現まとめ(深層学習を中心に)
三次元表現まとめ(深層学習を中心に)三次元表現まとめ(深層学習を中心に)
三次元表現まとめ(深層学習を中心に)Tomohiro Motoda
 
【メタサーベイ】Neural Fields
【メタサーベイ】Neural Fields【メタサーベイ】Neural Fields
【メタサーベイ】Neural Fieldscvpaper. challenge
 
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​SSII
 
SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜
SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜
SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜SSII
 
動画認識における代表的なモデル・データセット(メタサーベイ)
動画認識における代表的なモデル・データセット(メタサーベイ)動画認識における代表的なモデル・データセット(メタサーベイ)
動画認識における代表的なモデル・データセット(メタサーベイ)cvpaper. challenge
 
[DL輪読会]Learning Transferable Visual Models From Natural Language Supervision
[DL輪読会]Learning Transferable Visual Models From Natural Language Supervision[DL輪読会]Learning Transferable Visual Models From Natural Language Supervision
[DL輪読会]Learning Transferable Visual Models From Natural Language SupervisionDeep Learning JP
 
【チュートリアル】コンピュータビジョンによる動画認識 v2
【チュートリアル】コンピュータビジョンによる動画認識 v2【チュートリアル】コンピュータビジョンによる動画認識 v2
【チュートリアル】コンピュータビジョンによる動画認識 v2Hirokatsu Kataoka
 
[DL輪読会]Deep High-Resolution Representation Learning for Human Pose Estimation
[DL輪読会]Deep High-Resolution Representation Learning for Human Pose Estimation[DL輪読会]Deep High-Resolution Representation Learning for Human Pose Estimation
[DL輪読会]Deep High-Resolution Representation Learning for Human Pose EstimationDeep Learning JP
 
Triplet Loss 徹底解説
Triplet Loss 徹底解説Triplet Loss 徹底解説
Triplet Loss 徹底解説tancoro
 
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisDeep Learning JP
 

Tendances (20)

ドメイン適応の原理と応用
ドメイン適応の原理と応用ドメイン適応の原理と応用
ドメイン適応の原理と応用
 
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
 
【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
 
Generative Models(メタサーベイ )
Generative Models(メタサーベイ )Generative Models(メタサーベイ )
Generative Models(メタサーベイ )
 
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
 
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
 
動画認識サーベイv1(メタサーベイ )
動画認識サーベイv1(メタサーベイ )動画認識サーベイv1(メタサーベイ )
動画認識サーベイv1(メタサーベイ )
 
【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習
 
【DL輪読会】Patches Are All You Need? (ConvMixer)
【DL輪読会】Patches Are All You Need? (ConvMixer)【DL輪読会】Patches Are All You Need? (ConvMixer)
【DL輪読会】Patches Are All You Need? (ConvMixer)
 
三次元表現まとめ(深層学習を中心に)
三次元表現まとめ(深層学習を中心に)三次元表現まとめ(深層学習を中心に)
三次元表現まとめ(深層学習を中心に)
 
【メタサーベイ】Neural Fields
【メタサーベイ】Neural Fields【メタサーベイ】Neural Fields
【メタサーベイ】Neural Fields
 
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
 
SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜
SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜
SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜
 
動画認識における代表的なモデル・データセット(メタサーベイ)
動画認識における代表的なモデル・データセット(メタサーベイ)動画認識における代表的なモデル・データセット(メタサーベイ)
動画認識における代表的なモデル・データセット(メタサーベイ)
 
[DL輪読会]Learning Transferable Visual Models From Natural Language Supervision
[DL輪読会]Learning Transferable Visual Models From Natural Language Supervision[DL輪読会]Learning Transferable Visual Models From Natural Language Supervision
[DL輪読会]Learning Transferable Visual Models From Natural Language Supervision
 
【チュートリアル】コンピュータビジョンによる動画認識 v2
【チュートリアル】コンピュータビジョンによる動画認識 v2【チュートリアル】コンピュータビジョンによる動画認識 v2
【チュートリアル】コンピュータビジョンによる動画認識 v2
 
[DL輪読会]Deep High-Resolution Representation Learning for Human Pose Estimation
[DL輪読会]Deep High-Resolution Representation Learning for Human Pose Estimation[DL輪読会]Deep High-Resolution Representation Learning for Human Pose Estimation
[DL輪読会]Deep High-Resolution Representation Learning for Human Pose Estimation
 
Triplet Loss 徹底解説
Triplet Loss 徹底解説Triplet Loss 徹底解説
Triplet Loss 徹底解説
 
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 

Similaire à Action Recognitionの歴史と最新動向

動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understandingToru Tamaki
 
YolactEdge Review [cdm]
YolactEdge Review [cdm]YolactEdge Review [cdm]
YolactEdge Review [cdm]Dongmin Choi
 
How Deep Learning Could Predict Weather Events
How Deep Learning Could Predict Weather EventsHow Deep Learning Could Predict Weather Events
How Deep Learning Could Predict Weather Eventsinside-BigData.com
 
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen..."Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...Edge AI and Vision Alliance
 
Recent Progress on Single-Image Super-Resolution
Recent Progress on Single-Image Super-ResolutionRecent Progress on Single-Image Super-Resolution
Recent Progress on Single-Image Super-ResolutionHiroto Honda
 
Video complexity analyzer (VCA) for streaming applications
 Video complexity analyzer (VCA) for streaming applications Video complexity analyzer (VCA) for streaming applications
Video complexity analyzer (VCA) for streaming applicationsAlpen-Adria-Universität
 
Navigation-aware adaptive streaming strategies for omnidirectional video
Navigation-aware adaptive streaming strategies for omnidirectional videoNavigation-aware adaptive streaming strategies for omnidirectional video
Navigation-aware adaptive streaming strategies for omnidirectional videoSilvia Rossi
 
Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩Hiroto Honda
 
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...Michael Hewitt, GISP
 
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...Provectus
 
Presentation NBMP and PCC
Presentation NBMP and PCCPresentation NBMP and PCC
Presentation NBMP and PCCRufael Mekuria
 
GRT Imaging for Seismic AVO/AVA Inversion
GRT Imaging for Seismic AVO/AVA InversionGRT Imaging for Seismic AVO/AVA Inversion
GRT Imaging for Seismic AVO/AVA InversionMarie Spence
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Universitat Politècnica de Catalunya
 
“Video Activity Recognition with Limited Data for Smart Home Applications,” a...
“Video Activity Recognition with Limited Data for Smart Home Applications,” a...“Video Activity Recognition with Limited Data for Smart Home Applications,” a...
“Video Activity Recognition with Limited Data for Smart Home Applications,” a...Edge AI and Vision Alliance
 
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision..."Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision...Edge AI and Vision Alliance
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?SANGHEE SHIN
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureRouyun Pan
 
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...Kitsukawa Yuki
 

Similaire à Action Recognitionの歴史と最新動向 (20)

動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
 
YolactEdge Review [cdm]
YolactEdge Review [cdm]YolactEdge Review [cdm]
YolactEdge Review [cdm]
 
How Deep Learning Could Predict Weather Events
How Deep Learning Could Predict Weather EventsHow Deep Learning Could Predict Weather Events
How Deep Learning Could Predict Weather Events
 
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen..."Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
 
Recent Progress on Single-Image Super-Resolution
Recent Progress on Single-Image Super-ResolutionRecent Progress on Single-Image Super-Resolution
Recent Progress on Single-Image Super-Resolution
 
Video complexity analyzer (VCA) for streaming applications
 Video complexity analyzer (VCA) for streaming applications Video complexity analyzer (VCA) for streaming applications
Video complexity analyzer (VCA) for streaming applications
 
Navigation-aware adaptive streaming strategies for omnidirectional video
Navigation-aware adaptive streaming strategies for omnidirectional videoNavigation-aware adaptive streaming strategies for omnidirectional video
Navigation-aware adaptive streaming strategies for omnidirectional video
 
Neural Architectures for Video Encoding
Neural Architectures for Video EncodingNeural Architectures for Video Encoding
Neural Architectures for Video Encoding
 
Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩
 
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...
 
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
 
Presentation NBMP and PCC
Presentation NBMP and PCCPresentation NBMP and PCC
Presentation NBMP and PCC
 
GRT Imaging for Seismic AVO/AVA Inversion
GRT Imaging for Seismic AVO/AVA InversionGRT Imaging for Seismic AVO/AVA Inversion
GRT Imaging for Seismic AVO/AVA Inversion
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
 
“Video Activity Recognition with Limited Data for Smart Home Applications,” a...
“Video Activity Recognition with Limited Data for Smart Home Applications,” a...“Video Activity Recognition with Limited Data for Smart Home Applications,” a...
“Video Activity Recognition with Limited Data for Smart Home Applications,” a...
 
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision..."Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
 
Session6
Session6Session6
Session6
 
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
 

Dernier

Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Dernier (20)

Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Action Recognitionの歴史と最新動向

  • 1. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Action Recognition September 3, 2018 Katsunori Ohnishi DeNA Co., Ltd. 1
  • 2. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n n Action recognition n n n Deep Deep Temporal Aggregation n Tips n n 2
  • 3. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n ( ) Twitter: @ohnishi_ka n 2014 4 -2017 9 : B4~M2.5 Computer Vision • ( ) : http://katsunoriohnishi.github.io/ CVPR2016 (spotlight oral, acceptance rate=9.7%): egocentric vision (wrist-mounted camera) ACMMM2016 (poster, acceptance rate=30%): action recognition ( state-of-the-art) AAAI2018 (oral, acceptance rate=10.9%): video generation (FTGAN) 2017 10 - : DeNA AI • DeNA → https://www.wantedly.com/projects/209980 3
  • 4. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Action Recognition n Image classification action recognition = human action recognition • fine-grained egocentric 4 Fine-grained egocentric Dog-centric Action recognition RGBD Evaluation of video activity localizations integrating quality and quantity measurements [C. Wolf+, CVIU14] Recognizing Activities of Daily Living with a Wrist-mounted Camera [K. Ohnishi+, CVPR16] A Database for Fine Grained Activity Detection of Cooking Activities [M. Rohrbach+, CVPR12] First-Person Animal Activity Recognition from Egocentric Videos [Y. Iwashita+, ICPR14] Recognizing Human Actions: A Local SVM Approach [C. Schuldt+, ICPR04] HMDB: A Large Video Database for Human Motion Recognition [H. Kuehne+, ICCV11] Ucf101: A dataset of 101 human actions classes from videos in the wild [K. Soomro+, arXiv2012]
  • 5. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n KTH, UCF101, HMDB51 • UCF101 101 13320 … n Activity-net, Kinetics, Youtube8M n AVA, Moments in times, SLAC 5 UCF101
  • 6. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n YouTube-8M Video Understanding Challenge https://www.kaggle.com/c/youtube8m CVPR17 ECCV18 workshop , Kaggle frame-level test • kaggle , action recognition n ActivityNet Challenge http://activity-net.org/challenges/2018/ ActivityNet 3 • Temporal Proposal (T ) • Temporal localization (T ) • Video Captioning • Kinetics: classification (human action) • AVA: Spatio-temporal localization (XYT) • Moments-in-time: classification (event) 6
  • 7. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN n 2000 SIFT local descriptor→coding global feature→ n STIP [I. Laptev, IJCV04] Dense Trajectory [H. Wang+, ICCV11] Improved Dense Trajectory [H. Wang+, ICCV13] 7 • http://hirokatsukataoka.net/temp/presen/170121STAIRLab_slideshar e.pdf • https://arxiv.org/pdf/1605.04988.pdf On space-time interest points [I. Laptev, IJCV04] Action Recognition by Dense Trajectories [H. Wang+, ICCV11] Action Recognition with Improved Trajectories [H. Wang+, ICCV13]
  • 8. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN n Improved Dense Trajectories (iDT) [H. Wang+, ICCV13] Dense Trajectories [H. Wang+, ICCV11] 8 2 optical flow foreground optical flow Improved dense trajectories (green) (background dense trajectories (white))
  • 9. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN n 9 SIFT Fisher Vector Fisher vector http://www.isi.imi.i.u-tokyo.ac.jp/~harada/pdf/SSII_harada20120608.pdf https://www.slideshare.net/takao-y/fisher-vector … input Local descriptor iDT Video descriptor Fisher Vector [F. Perronnin+, CVPR07] Classifier SVM Fisher kernels on visual vocabularies for image categorization [F. Perronnin, CVPR07] [F. Pedregosa+, JMLR11]
  • 10. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition n CNN Two-stream • Hand-crafted feature ( ) 3D Convolution • C3D • C3D Two-stream • 3D conv Optical flow 10
  • 11. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: CNN n Spatio-temporal ConvNet [A. Karpathy+, CVPR 14] CNN AlexNet RGB ch → 10 frames ch (gray) multi scale Fusion Sports1M pre-training UCF101 65.4 (iDT 85.9%) 11 Large-scale video classification with convolutional neural network [A. Karpathy+, CVPR14] • 10 frames conv1 ch • RGB gray frame-by-frame score ( )
  • 12. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: Two-stream n Two-stream [K. Simonyan+, NIPS15] 2D CNN* , • Spatial-stream: RGB (input: RGB) • Temporal-stream: Optical flow (input: optical flow 10 frames) • Frame-by-frame Hand-crafted feature CNN 12 Two-stream convolutional networks for action recognition in videos [K. Simonyan+, NIPS15] UCF101 HMDB51 iDT 85.9% 57.2% Spatio-temporal ConvNet 65.4% - RGB-stream 73.0% 40.5% Flow-stream 83.7% 54.6% Two-steam 88.0% 59.4% • ( ) • 2DCNN *imagenet pre-trained
  • 13. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n C3D [D. Tran +, ICCV15] 16frame 3D convolution CNN • XYT 3D convolution UCF101 pre-training ICCV15 arxiv 2 reject 13 Learning Spatiotemporal Features with 3D Convolutional Networks [D. Tran +, ICCV15] UCF101 HMDB51 iDT 85.9% 57.2% Two-steam 88.0% 59.4% C3D (1net) 82.3% - 3D conv
  • 14. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n P3D [Z. Qiu+, ICCV17] C3D , 3D conv → 2D conv (XY) + 1D conv (T) pre-training 14 Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks [Z. Qiu+, ICCV17] UCF101 HMDB51 iDT 85.9% 57.2% Two-steam (Alexnet) 88.0% 59.4% P3D (ResNet) 88.6% - Spatial 2D conv Temporal 1D conv
  • 15. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n P3D [Z. Qiu+, ICCV17] C3D , 3D conv → 2D conv (XY) + 1D conv (T) pre-training 15 Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks [Z. Qiu+, ICCV17] UCF101 HMDB51 iDT 85.9% 57.2% Two-steam (Alexnet) 88.0% 59.4% P3D (ResNet) 88.6% - Two-stream (ResNet152) 91.8%Spatial 2D conv Temporal 1D conv 3D conv again
  • 16. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n C3D, P3D 3D conv n 3D conv [K. Hara+, CVPR18] 16 Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? [K. Hara+, CVPR18] 2012 2011 2015 2017
  • 17. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n C3D, P3D 3D conv n 3D conv [K. Hara+, CVPR18] 17 Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? [K. Hara+, CVPR18] 2012 2011 2015 20172017 Kinetics!
  • 18. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n Kinetics human action dataset! 3D conv • Pre-train UCF101 18 The Kinetics human action video dataset [W. Kay+, arXiv17] • Youtube8M •
  • 19. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n I3D [J. Carreira +, ICCV17] Kinetics dataset DeepMind 3D conv Inception 64 GPUs for training, 16 GPUs for predict state-of-the-art • RGB • Two-stream optical flow score 19 Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J. Carreira +, ICCV17] UCF101 HMDB51 RGB-I3D 95.6% 74.8% Flow-I3D 96.7% 77.1% Two-stream I3D 98.0% 80.7% …
  • 20. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n I3D [J. Carreira +, ICCV17] Kinetics dataset DeepMind 3D conv Inception 64 GPUs for training, 16 GPUs for predict state-of-the-art • RGB • Two-stream optical flow score 20 Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J. Carreira +, ICCV17] UCF101 HMDB51 RGB-I3D 95.6% 74.8% Flow-I3D 96.7% 77.1% Two-stream I3D 98.0% 80.7% … ?
  • 21. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n I3D Two-stream 3D convolution n ( ) 3D conv XY T • XY T 3D conv 21 time
  • 22. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n 3D convolution [D.A. Huang+, CVPR18] • 3D CNN • → • • Two-stream I3D Optical flow 3D conv 22 What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets [D.A. Huang+, CVPR18]
  • 23. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n 3D conv CVPR18 CVPR/ICCV/ECCV 3D conv 3D conv • GPU 23
  • 24. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: Optical flow n Optical flow [L Sevilla-Lara+, CVPR18] • Optical flow • Optical flow (EPE) action recognition • flow action recognition • Optical flow appearance • Optical flow 24 On the Integration of Optical Flow and Action Recognition [L Sevilla-Lara+, CVPR18]
  • 25. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. 25 AVA XYZT bounding box human action localization Moments-in-time 3 Kinetics-600 Kinetics 400 600 [C. Gu+, CVPR18] [M. Monfort+, arXiv2018] [W. Kay+, arXiv2017]
  • 26. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Temporal Aggregation n 2D conv frame-by-frame 3D conv (100 frames, 232 frames, 50 frames) 26
  • 27. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Temporal Aggregation n Score → LSTM → • FC ? • fencing → fencing →… 27 … … CNN LSTM FC CNN LSTM FC CNN LSTM FC CVPR ACMMM AAAI …
  • 28. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. … input Local descriptor iDT Video descriptor Fisher Vector [F. Perronnin+, CVPR07] Classifier SVM [F. Pedregosa+, JMLR11] Temporal Aggregation n , → …! Fisher Vector • CNN SIFT GMM • FV VLAD [H. Jegou+, CVPR10] 28 Aggregating local descriptors into a compact image representation [H. Jegou+, CVPR10]
  • 29. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Temporal Aggregation n LCD [Z. Xu+, CVPR15] VGG16 pool5 XY 512dim feature • 224x224 feature 7x7=49 • VLAD global feature 29 A discriminative CNN video representation for event detection [Z. Xu+, CVPR15] … input CNN Pool5 (e.g. 2x2x512) Local descriptors VLAD SVM global feature CNN CNN
  • 30. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Temporal Aggregation n ActionVLAD [R. Girdhar+, CVPR17] NetVLAD [R Arandjelović+, CVPR16] • NetVLAD VLAD NN Cluster assign softmax assign • VLAD LCD VLAD • End2end CNN ! 30 ActionVLAD: Learning spatio-temporal aggregation for action classification [R. Girdhar+, CVPR17]
  • 31. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Temporal Aggregation n TLE [A. Diba+, CVPR17] VLAD Compact Bilinear Pooling [Y. Gao+, CVPR16] Temporal Aggregation VLAD • SVM VLAD NN 31 Deep Temporal Linear Encoding Networks [A. Diba+, CVPR17]
  • 32. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Tips n Two-stream (ResNet) 2D conv Optical flow n Single model State-of-the-art I3D + TLE BA 64GPU n Two-stream optical flow GPU • optical flow stream • RGB-stream Optical flow 32
  • 33. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Tips n CNN TLE coding • TLE ActionVLAD iDT • CNN • FisherVector iDT Tips: PCA (dim=64). K=256. FV power norm • CPU 33
  • 34. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Temporal Aggregation n Score → LSTM → • FC ? • fencing → fencing →… 34 … … CNN LSTM FC CNN LSTM FC CNN LSTM FC CVPR ACMMM AAAI … input ↓ Two-stream
  • 35. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n LSTM 3D conv Optical flow • [L Sevilla-Lara+, CVPR18] 35 … … CNN LSTM FC CNN LSTM FC CNN LSTM FC
  • 36. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. 2D conv + LSTM 3D conv 3D conv Two-stream Optical flow MoCoGAN [S. Tulyakov+, CVPR18] VGAN [C. Vondrick+, NIPS16] TGAN [M. Saito+, ICCV17] FTGAN [K. Ohnishi+, AAAI18] LRCN [J. Donahue+, CVPR15] C3D [D. Tran+, ICCV15] P3D [Z. Qiu+, ICCV17] Two-stream [K. Simonyan+, NIPS15] I3D [J. Carreira +, ICCV17] ( )VGAN
  • 37. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. 2D conv + LSTM 3D conv 3D conv Two-stream Optical flow MoCoGAN [S. Tulyakov+, CVPR18] VGAN [C. Vondrick+, NIPS16] TGAN [M. Saito+, ICCV17] FTGAN [K. Ohnishi+, AAAI18] LRCN [J. Donahue+, CVPR15] C3D [D. Tran+, ICCV15] P3D [Z. Qiu+, ICCV17] Two-stream [K. Simonyan+, NIPS15] I3D [J. Carreira +, ICCV17] ( ) ! VGAN
  • 38. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n ! Hierarchical Video Generation from Orthogonal Information: Optical Flow and Texture K. Ohnishi+, AAAI 2018 (oral presentation) https://arxiv.org/abs/1711.09618 38 Optical flow
  • 39. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n Action classification • Temporal action localization Spatio-temporal localization 3D conv Augmentation n Pose Pose • pose • data distillation n Tips &optical flow Kinetics Youtube 39
  • 40. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n XY XYT O(n2)→ O(n3) • ! n n n 40