SlideShare a Scribd company logo
1 of 1
Download to read offline
Dongang Wang1
, Wanli Ouyang1,2
,Wen Li3
, and Dong Xu1
Dividing and Aggregating Network for Multi-view
Action Recognition
1
School of Electrical and Information Engineering, The University of Sydney 2
SenseTime Computer Vision Research Group, The University of Sydney 3
Computer Vision Laboratory, ETH Zurich
Motivation
q It is well-known that feature variations caused by viewpoints can influence
the classification accuracy.
q We want to learn view-specific representations instead of extracting view-
invariant features using global codebooks or dictionaries.
q The view-specific features can be used to help each other because
different feature extractor may have different activation area.
q
Training details
q Backbone: temporal segment network (TSN) [1]
q Contains two stages. Stage1: train the basic modules for
feature extractors for each view, and Stage2: fine-tune the
extractors after adding view-classifier and message passing
modules.
q For cross-subject setting, the branch number equals the
total views. For cross-view setting, the branch number equals
the total view minus 1.
q Each branch duplicate parts of inception_5b.
Modules
q Basic Multi-branch Module
This part will extract the view-independent features by using the shared CNN,
and then extract the view-specific features in each CNN branch. It should be
trained in the first place to get the basic knowledge of each view.
q Message Passing Module
By treating the view-specific features as fv, and the refined feature as hv for each
view v, we can model the relationship by using conditional random field. The
solution is as follows:
The Wu,v's are the parameters in fully connected layers learned between either
two branches. We implement the message passing by using two fully connected
layers.
q View-prediction-guided Fusion Module
This module contains two stages. First we combine the scores from all v-th view
specific classifiers for the i-th video to form the view-specific scores Sv. Then we
use the view prediction scores to generate the final action classification score Ti
.
The pv's are the view classification scores for the i-th video. They are generated
from the view-independent features.
References
[1] Wang, L., Xiong, Y.,Wang, Z., Qiao, Y., Lin, D., Tang, X., Van Gool, L.: Temporal segment
networks: towards good practices for deep action recognition. In: ECCV 2016
[2] Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: Ntu rgb+ d: A large scale dataset for 3d human
activity analysis. In CVPR 2016
[3] Wang, J., Nie, X., Xia, Y., Wu, Y., Zhu, S.C.: Cross-view action modeling, learning
and recognition. In: CVPR 2014
Contribution
q We design a multi-branch network for multi-view action recognition. The
network is trained using RGB videos and the extracted dense optical flow,
which follows the two-stream CNN scheme [1].
q Conditional random field (CRF) is introduced to pass message among
view-specific features from different branches.
q A view-prediction-guided fusion method for combining action
classification scores from multiple branches is proposed. The view
prediction score is used as weights for combination.
Experiment results
The accuracy for NTU-RGB+D dataset [2] is:
The accuracy for Northwestern-UCLA Multiview Action[3] is:
We also did the ablation test for each modules based on NTU-RGB+D
dataset under cross-view setting:
... ...... ...
. . . . . .
Final action
class score Y
. . .. . . . . .. . .
... ... ......
1
p
v
p
V
p
1
S v
S V
S
1 1,
 1,u
 ,u v
 ,u V
 ,V V

1 1,
C 1,v
C 1,V
C 1,u
C ,u v
C ,u V
C 1,V
C ,V v
C ,V V
C
 1
,v v v u v u
u vv

 
 
  
 
h f W h
i
, ,
i i
v u v u v
u
S C  1
V
i i i
v v
v
p S

 
i
v
S
i
v
p
Problem
q Training: labeled videos from multiple views:
q Test: new samples from the known views
q Test: a new sample from the unknown views
Message
from A to B
Combined features
from Branch B
Message
from C to B
Features in
Branch A
Features in
Branch B
Features in
Branch C
Input video
from View B
. . .
Multi-
branch CNN
. . .
Inception 5a
output
1x1
convolutions
1x1
convolutions
1x1
convolutions
1x1
convolutions
3x3
convolutions
3x3
convolutions
3x3
convolutions
Inception 5b
output
pooling
Shared CNN CNN Branch
... ...
... ... Deep
Model
Action
category
Deep
Model
Deep
Model
...
Final action
class score Y
View
prediction
score
Shared
CNN
CNN
branch(V)
CNN
branch(u)
CNN
branch(1)
message
passing
message
passing
View
classifier Refined
view-
specific
feature(1)
Refined
view-
specific
feature(u)
Refined
view-
specific
feature(V)
View-specific
classifier (1,1)
View-specific
classifier (1, v)
View-specific
classifier (u, 1)
View-specific
classifier (u, v)
Score
fusion
...
...
...
......
......
...
... ...
...
Input: multi-
view videos
Basic Multi-branch Module Message Passing
Module
View-prediction-
guided
Fusion Module
View-
specific
feature(1)
View-
specific
feature(u)
View-
specific
feature(V)
View-
independent
feature
1 1,
C
1,v
C
1,u
C
,u v
C
Methods Modalities Cross-Subject Cross-View
STA-Hands Pose+RGB 82.50% 88.60%
Baradel et al. Pose+RGB 84.80% 90.60%
TSN RGB 84.93% 85.36%
DA-Net (Ours) RGB 88.12% 91.96%
Methods Cross-Subject Cross-View
MST-AOG 81.6% 73.3%
Kong et al. 81.1% 77.2%
TSN 90.3% 80.6%
DA-Net (Ours) 92.1% 84.2%
Method RGB-stream Flow-stream Two-stream
Basic multi-branch 73.9% 87.7% 89.8%
DA-Net (w/o msg.) 74.1% 88.4% 90.7%
DA-Net (w/o fus.) 74.5% 88.6% 90.9%
DA-Net 75.3% 88.9% 92.0%

More Related Content

What's hot

What's hot (20)

Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
 
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
 
Reproducing and Analyzing Adaptive Computation Time in PyTorch and TensorFlow
Reproducing and Analyzing Adaptive Computation Time in PyTorch and TensorFlowReproducing and Analyzing Adaptive Computation Time in PyTorch and TensorFlow
Reproducing and Analyzing Adaptive Computation Time in PyTorch and TensorFlow
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
 
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
 
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
 
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
 
Support Vector Machine (Classification) - Step by Step
Support Vector Machine (Classification) - Step by StepSupport Vector Machine (Classification) - Step by Step
Support Vector Machine (Classification) - Step by Step
 
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
 
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1
 
Devil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesDevil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet Features
 
Comparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit RecognitionComparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit Recognition
 
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 
QMC: Transition Workshop - Monte Carlo and (Randomized) Quasi-Monte Carlo Sim...
QMC: Transition Workshop - Monte Carlo and (Randomized) Quasi-Monte Carlo Sim...QMC: Transition Workshop - Monte Carlo and (Randomized) Quasi-Monte Carlo Sim...
QMC: Transition Workshop - Monte Carlo and (Randomized) Quasi-Monte Carlo Sim...
 
The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 

Similar to Dividing and Aggregating Network for Multi-view Action Recognition [Poster in ECCV 2018]

U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptx
NoorUlHaq47
 
imageclassification-160206090009.pdf
imageclassification-160206090009.pdfimageclassification-160206090009.pdf
imageclassification-160206090009.pdf
KammetaJoshna
 

Similar to Dividing and Aggregating Network for Multi-view Action Recognition [Poster in ECCV 2018] (20)

On the Influence Propagation of Web Videos
On the Influence Propagation of Web VideosOn the Influence Propagation of Web Videos
On the Influence Propagation of Web Videos
 
Real time Traffic Signs Recognition using Deep Learning
Real time Traffic Signs Recognition using Deep LearningReal time Traffic Signs Recognition using Deep Learning
Real time Traffic Signs Recognition using Deep Learning
 
Subclass deep neural networks
Subclass deep neural networksSubclass deep neural networks
Subclass deep neural networks
 
Sign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedSign Detection from Hearing Impaired
Sign Detection from Hearing Impaired
 
Paper
PaperPaper
Paper
 
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術
 
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
 
Deep neural networks for Youtube recommendations
Deep neural networks for Youtube recommendationsDeep neural networks for Youtube recommendations
Deep neural networks for Youtube recommendations
 
Deep Learning for Computer Vision: Visualization (UPC 2016)
Deep Learning for Computer Vision: Visualization (UPC 2016)Deep Learning for Computer Vision: Visualization (UPC 2016)
Deep Learning for Computer Vision: Visualization (UPC 2016)
 
Ire presentation
Ire presentationIre presentation
Ire presentation
 
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINEUnderstanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
 
research_paper
research_paperresearch_paper
research_paper
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptx
 
Dance With AI – An interactive dance learning platform
Dance With AI – An interactive dance learning platformDance With AI – An interactive dance learning platform
Dance With AI – An interactive dance learning platform
 
IRJET- Object Detection using Machine Learning Technique
IRJET- Object Detection using Machine Learning TechniqueIRJET- Object Detection using Machine Learning Technique
IRJET- Object Detection using Machine Learning Technique
 
imageclassification-160206090009.pdf
imageclassification-160206090009.pdfimageclassification-160206090009.pdf
imageclassification-160206090009.pdf
 
IRJET- Machine Learning and Deep Learning Methods for Cybersecurity
IRJET- Machine Learning and Deep Learning Methods for CybersecurityIRJET- Machine Learning and Deep Learning Methods for Cybersecurity
IRJET- Machine Learning and Deep Learning Methods for Cybersecurity
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Dividing and Aggregating Network for Multi-view Action Recognition [Poster in ECCV 2018]

  • 1. Dongang Wang1 , Wanli Ouyang1,2 ,Wen Li3 , and Dong Xu1 Dividing and Aggregating Network for Multi-view Action Recognition 1 School of Electrical and Information Engineering, The University of Sydney 2 SenseTime Computer Vision Research Group, The University of Sydney 3 Computer Vision Laboratory, ETH Zurich Motivation q It is well-known that feature variations caused by viewpoints can influence the classification accuracy. q We want to learn view-specific representations instead of extracting view- invariant features using global codebooks or dictionaries. q The view-specific features can be used to help each other because different feature extractor may have different activation area. q Training details q Backbone: temporal segment network (TSN) [1] q Contains two stages. Stage1: train the basic modules for feature extractors for each view, and Stage2: fine-tune the extractors after adding view-classifier and message passing modules. q For cross-subject setting, the branch number equals the total views. For cross-view setting, the branch number equals the total view minus 1. q Each branch duplicate parts of inception_5b. Modules q Basic Multi-branch Module This part will extract the view-independent features by using the shared CNN, and then extract the view-specific features in each CNN branch. It should be trained in the first place to get the basic knowledge of each view. q Message Passing Module By treating the view-specific features as fv, and the refined feature as hv for each view v, we can model the relationship by using conditional random field. The solution is as follows: The Wu,v's are the parameters in fully connected layers learned between either two branches. We implement the message passing by using two fully connected layers. q View-prediction-guided Fusion Module This module contains two stages. First we combine the scores from all v-th view specific classifiers for the i-th video to form the view-specific scores Sv. Then we use the view prediction scores to generate the final action classification score Ti . The pv's are the view classification scores for the i-th video. They are generated from the view-independent features. References [1] Wang, L., Xiong, Y.,Wang, Z., Qiao, Y., Lin, D., Tang, X., Van Gool, L.: Temporal segment networks: towards good practices for deep action recognition. In: ECCV 2016 [2] Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In CVPR 2016 [3] Wang, J., Nie, X., Xia, Y., Wu, Y., Zhu, S.C.: Cross-view action modeling, learning and recognition. In: CVPR 2014 Contribution q We design a multi-branch network for multi-view action recognition. The network is trained using RGB videos and the extracted dense optical flow, which follows the two-stream CNN scheme [1]. q Conditional random field (CRF) is introduced to pass message among view-specific features from different branches. q A view-prediction-guided fusion method for combining action classification scores from multiple branches is proposed. The view prediction score is used as weights for combination. Experiment results The accuracy for NTU-RGB+D dataset [2] is: The accuracy for Northwestern-UCLA Multiview Action[3] is: We also did the ablation test for each modules based on NTU-RGB+D dataset under cross-view setting: ... ...... ... . . . . . . Final action class score Y . . .. . . . . .. . . ... ... ...... 1 p v p V p 1 S v S V S 1 1,  1,u  ,u v  ,u V  ,V V  1 1, C 1,v C 1,V C 1,u C ,u v C ,u V C 1,V C ,V v C ,V V C  1 ,v v v u v u u vv           h f W h i , , i i v u v u v u S C  1 V i i i v v v p S    i v S i v p Problem q Training: labeled videos from multiple views: q Test: new samples from the known views q Test: a new sample from the unknown views Message from A to B Combined features from Branch B Message from C to B Features in Branch A Features in Branch B Features in Branch C Input video from View B . . . Multi- branch CNN . . . Inception 5a output 1x1 convolutions 1x1 convolutions 1x1 convolutions 1x1 convolutions 3x3 convolutions 3x3 convolutions 3x3 convolutions Inception 5b output pooling Shared CNN CNN Branch ... ... ... ... Deep Model Action category Deep Model Deep Model ... Final action class score Y View prediction score Shared CNN CNN branch(V) CNN branch(u) CNN branch(1) message passing message passing View classifier Refined view- specific feature(1) Refined view- specific feature(u) Refined view- specific feature(V) View-specific classifier (1,1) View-specific classifier (1, v) View-specific classifier (u, 1) View-specific classifier (u, v) Score fusion ... ... ... ...... ...... ... ... ... ... Input: multi- view videos Basic Multi-branch Module Message Passing Module View-prediction- guided Fusion Module View- specific feature(1) View- specific feature(u) View- specific feature(V) View- independent feature 1 1, C 1,v C 1,u C ,u v C Methods Modalities Cross-Subject Cross-View STA-Hands Pose+RGB 82.50% 88.60% Baradel et al. Pose+RGB 84.80% 90.60% TSN RGB 84.93% 85.36% DA-Net (Ours) RGB 88.12% 91.96% Methods Cross-Subject Cross-View MST-AOG 81.6% 73.3% Kong et al. 81.1% 77.2% TSN 90.3% 80.6% DA-Net (Ours) 92.1% 84.2% Method RGB-stream Flow-stream Two-stream Basic multi-branch 73.9% 87.7% 89.8% DA-Net (w/o msg.) 74.1% 88.4% 90.7% DA-Net (w/o fus.) 74.5% 88.6% 90.9% DA-Net 75.3% 88.9% 92.0%