SlideShare une entreprise Scribd logo
1  sur  20
Human Action Recognition Using
Attention Based Spatiotemporal Graph
Convolutional Network
Under the guidance of:
Prof. Anil Singh Parihar
Dept. of Computer Science &
Engineering
Submitted By:
Anshula Sharma
2K21/CSE/07
What is Human Action Recognition?
• Human Action Recognition or (HAR) is concerned with predicting or classifying the
actions being performed by a human being.
• It is an important area of research.
• Different data modalities used for HAR:
1. Optical Flow
2. RGB Images
3. Body Skeletons
• Deep Learning techniques are used to predict and recognize human actions.
Problem Statement
• The major goal is to build and improve a model that can recognize and interpret human actions
using skeletal data.
• Traditional techniques frequently depend on RGB video data, which can be affected by lighting and
occlusions. By capturing human actions using the spatial joints, action recognition based on
skeletons provide a more robust and efficient alternative.
• However, skeleton-based action identification faces a number of obstacles:
• It is still difficult to extract relevant information from skeletal data and properly capture the
spatiotemporal dynamics of human motions.
• Existing approaches usually process the body skeletons in the entire sequence that represents
the action performed. This strategy is inefficient in terms of computation time and memory
utilization.
• We propose an attention based spatiotemporal graph convolutional network to overcome these
challenges.
Recognition using Skeleton based data
• Body skeletons are increasingly used for
Human action recognition due to their
compact and action-focused nature.
• Skeletons are three-dimensional or two-
dimensional coordinate representations of
human body joints.
• Skeletons are found in graph formations,
where the graph’s nodes represents the
skeleton's joints whereas the edges of the
graph indicate the many connections between
various body joints.
• Actions can be identified from the different
motion patterns of the joints of the skeletal
body.
Graph Convolutional Networks
• Graph Convolutional Networks (GCNs) have become quite prominent in the field of skeleton-based
action recognition.
• Using deep feed-forward architectures, graph convolutional networks successfully capture the
spatiotemporal characteristics inherent in human skeletons.
• GCNs are a variant of Convolutional Neural Networks (CNN), and help to generalize graph-
structured data.
• GCNs operate, in a manner similar to CNNs, by inspecting the neighboring nodes.
• The input is in non-Euclidean structural form data, with each node having varying numbers of
connections.
• Nodes and their connections (edges) with other nodes are represented with the help of an
adjacency matrix, which is then introduced to the forward propagation equation.
Graph Convolutional Network
• The initial input to the GCN is the graph's node
features along with the adjacency matrix. It
records local connection patterns as well as
information from surrounding nodes.
• The aggregated node features are applied with
the weight matrix. It is multiplied by the
aggregated features to compute the modified
features for each node.
• Finally, non-linear activation function is applied to
obtain updated node representations. The
updated node representations are then served as
the next layer’s input, along with the adjacency
matrix.
Dataset: NTU-RGB+D
Dataset
• The model is experimented on a large-scale indoor action recognition dataset, NTU-RGB+D.
• It is the most comprehensive datasets for 3D annotations for human action recognition tasks.
• The video clips have been acquired by Microsoft Kinect v2 sensors.
• It covers 60 action classes covering over 56,880 action samples, with the actions carried out by 40
distinct test subjects.
• The action classes cover a wide variety of daily actions, including walking, waving, clapping, sitting down,
getting up, and playing musical instruments. It consists of both individual actions and interactions
between two subjects.
• The annotations that are obtained by the Kinect depth sensors offer 3D joint positions (X, Y, Z).
• A total of 25 joints are present in each subject in the skeleton series.
• The dataset consists of two evaluation benchmarks:
1. Cross-view (X-view): 3 different cameras capture the videos. The training set comprises of 37,820 video clips. The test
set consists of 18,960 video clips.
2. Cross-subject (X-sub): Focuses on cross-subject action recognition. The training set has 40,320 videos and the test
set contains 16,560 videos.
Proposed Approach
Proposed Approach
• The proposed approach is an attention-based model for human action recognition that uses both
temporal and spatial attention modules to improve recognition accuracy.
• The temporal attention module selects the most informative frames from a sequence of skeletons,
capturing the action's critical temporal dynamics.
• Following that, the spatial attention mechanism highlights the most significant joints within the
selected frames, emphasizing their distinguishing characteristics.
• The computed attention scores are then used to select frames, allowing the identification of the
skeletons with the highest attention values.
• Both temporal and spatial relationships are effectively utilized in the skeletal data by incorporating
the attention modules into a graph convolutional network.
• The temporal and spatial attention mechanisms together improve the efficiency of human action
recognition based on skeletons, resulting in further accurate and robust identification results.
Spatial Graph Convolution Module
• Spatial Graph Convolution block focuses on capturing spatial relationships.
• It operates by considering the nodes in the network as skeletal joints and the edges as connections that
describe interactions between these joints.
• The block aggregates information from nearby nodes in order to capture local interactions and
dependencies.
• The block accepts a tensor of shape (N, Cin, T, V) as input, where N denotes the batch size, T represents
the sequence length, Cin represents the number of input channels, and V represents the number of
joints.
• Along with the input tensor, an adjacency matrix is used which depicts pairwise interactions between
the joints.
• The block then runs a graph convolution operation, which updates the characteristics of each joint by
aggregating information from its nearby joints.
• Non-linear transformations are used after the graph convolution function to incorporate non-linearity
and improve discriminative capability of the features.
Temporal Graph Convolution Module
• Temporal convolution block captures the dynamics of temporal information in the skeletal data.
• The block takes as input the output of the spatial graph convolution block.
• For temporal graph convolution, the neighborhood of each vertex is extended to include each node's
temporal neighbors.
• Each node in the preceding and following skeleton frames is linked to the same node, resulting in a
temporal neighborhood size of 2 for each node.
• A 2D convolution is performed to process the extended neighborhood, with a kernel which determines
the convolution operation's temporal receptive field.
• The temporal convolution block aggregates the features of each individual body joint at various time
steps by using a fixed kernel size, allowing the model to capture the dynamics of temporal dimension
and patterns contained in the skeletal data.
Temporal Attention Block
• Temporal attention is used to detect relevant frames within a sequence of frames.
• Its goal is to emphasize and recognize the frames that make a significant contribution to the overall
interpretation of an action.
• The temporal attention module computes the average activation over all joints and channels for each
frame.
• It aggregates the collective information inside each frame, indicating its overall significance.
• The model learns the weights associated with each frame by passing the aggregated frame-level
activations through a linear layer.
• A sigmoid activation function is then applied to these weights which produces attention weights for
each frame.
• These attention weights serve as a mask, selectively amplifying or suppressing specific frame activations.
• The model can successfully focus on the frames that are deemed significant or informative for the given
action by applying attention weights to the input.
Spatial Attention Block
• Spatial attention is concerned with identifying important joints within the frames highlighted in the
temporal attention module.
• It computes the average activation for each joint over all frames and channels.
• The joint-level activations are then transferred to a linear layer, which allows the model to learn the
weights associated with each joint.
• These weights are then fed into a sigmoid activation function, which produces attention weights that
indicate the relative significance of each joint.
• The generated attention weights operate as a mask, selectively adjusting the activations of individual
joints based on their relevance by adding attention weights to the attended frames.
• Finally, a subset of most informative skeletons from a given sequence of skeletons is fetched.
• The skeletons are sorted in a decreasing order according to their attention weights.
• These selected skeletons are then incorporated into the network's subsequent layers for additional
processing and analysis.
Network Architecture
• Skeleton data is fed as the input to the model comprising of the coordinate locations of all the joints of the
skeleton.
• The model is made up of six graph convolutional layers and an attention module.
• Initially, two spatial graph convolutional layers are present in the model. Spatial Graph Convolution layer
focuses on capturing spatial relationships.
• Temporal and Spatial attention blocks are present after the spatial graph convolutional layers. The temporal
attention module captures the most informative frames from a sequence of skeletons. The spatial attention
mechanism emphasizes the most informative joints from the frames highlighted. Frame selection is then
performed to select the skeletons with the highest attention scores.
• The last four graph convolutional layers combine spatial and temporal convolutions. Temporal dynamics in the
skeletal data is captured by the temporal convolution block. The block takes as input the output of the spatial
graph convolution block.
• Each skeleton sequence's enhanced spatiotemporal characteristics are passed through a global average
pooling, yielding a 256-dimensional output feature vector.
• Finally, human actions are classified using a fully connected layer that includes a SoftMax classifier. To reduce
classification error, the model trains end-to-end via backpropagation.
Experimental Results
Experimental Results
• The model achieved Top-1 and Top-5
accuracy of 83.58% and 97.07% on the
cross-subject benchmark. While a Top-
1 and Top-5 accuracy of 91.22% and
98.85% is achieved on the cross-view
benchmark.
• Model is compared with other notable
approaches on the Top-1 accuracy for
both cross-subject and cross-view
benchmarks.
Conclusion
• Utilization of temporal and spatial attention mechanisms helped in enhancing the model’s
performance.
• The model is evaluated on a widely used NTU-RGB+D benchmark, assessing its top-1 and top-5
accuracies on the dataset’s cross-view and cross-subject benchmarks.
• In comparison to other skeleton-based models, we observed a significant difference in
performance between RNN and CNN-based methods and our method. Furthermore, our method
outperformed other GCN-based models, demonstrating the benefit of incorporating spatial and
temporal attention mechanisms to graph convolutional network which enhanced the model's
accuracy and efficiency.
References
• S. Yan, Y. Xiong and D. Lin, "Spatial temporal graph convolutional networks for skeleton-based
action recognition," in Proceedings of the AAAI conference on artificial intelligence, 2018.
• N. Heidari and A. Iosifidis, "Temporal attention-augmented graph convolutional network for
efficient skeleton-based human action recognition," in 2020 25th International Conference on
Pattern Recognition (ICPR), 2021.
• T. N. Kipf and M. Welling, ‘Semi-Supervised Classification with Graph Convolutional Networks’,
in International Conference on Learning Representations, 2017.
• A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar and L. Fei-Fei, "Large-scale video
classification with convolutional neural networks," 2014.
• Y. Du, W. Wang and L. Wang, "Hierarchical recurrent neural network for skeleton based action
recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition,
2015.
THANK YOU

Contenu connexe

Similaire à final_project_1_2k21cse07.pptx

A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksRimzim Thube
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxssuser3aa461
 
IRJET-Multiple Object Detection using Deep Neural Networks
IRJET-Multiple Object Detection using Deep Neural NetworksIRJET-Multiple Object Detection using Deep Neural Networks
IRJET-Multiple Object Detection using Deep Neural NetworksIRJET Journal
 
A Virtual Infrastructure for Mitigating Typical Challenges in Sensor Networks
A Virtual Infrastructure for Mitigating Typical Challenges in Sensor NetworksA Virtual Infrastructure for Mitigating Typical Challenges in Sensor Networks
A Virtual Infrastructure for Mitigating Typical Challenges in Sensor NetworksMichele Weigle
 
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...gabrielesisinna
 
[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptxthanhdowork
 
04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptxZainULABIDIN496386
 
A systematic image compression in the combination of linear vector quantisati...
A systematic image compression in the combination of linear vector quantisati...A systematic image compression in the combination of linear vector quantisati...
A systematic image compression in the combination of linear vector quantisati...eSAT Publishing House
 
Attention correlated appearance and motion feature followed temporal learning...
Attention correlated appearance and motion feature followed temporal learning...Attention correlated appearance and motion feature followed temporal learning...
Attention correlated appearance and motion feature followed temporal learning...IJECEIAES
 
CONVOLUTIONAL NEURAL NETWORK BASED FEATURE EXTRACTION FOR IRIS RECOGNITION
CONVOLUTIONAL NEURAL NETWORK BASED FEATURE EXTRACTION FOR IRIS RECOGNITION CONVOLUTIONAL NEURAL NETWORK BASED FEATURE EXTRACTION FOR IRIS RECOGNITION
CONVOLUTIONAL NEURAL NETWORK BASED FEATURE EXTRACTION FOR IRIS RECOGNITION AIRCC Publishing Corporation
 
CONVOLUTIONAL NEURAL NETWORK BASED FEATURE EXTRACTION FOR IRIS RECOGNITION
CONVOLUTIONAL NEURAL NETWORK BASED FEATURE EXTRACTION FOR IRIS RECOGNITION CONVOLUTIONAL NEURAL NETWORK BASED FEATURE EXTRACTION FOR IRIS RECOGNITION
CONVOLUTIONAL NEURAL NETWORK BASED FEATURE EXTRACTION FOR IRIS RECOGNITION ijcsit
 
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...Deep Convolutional 3D Object Classification from a Single Depth Image and Its...
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...Yuji Oyamada
 
Wits presentation 6_28072015
Wits presentation 6_28072015Wits presentation 6_28072015
Wits presentation 6_28072015Beatrice van Eden
 
Accurate and Energy-Efficient Range-Free Localization for Mobile Sensor Networks
Accurate and Energy-Efficient Range-Free Localization for Mobile Sensor NetworksAccurate and Energy-Efficient Range-Free Localization for Mobile Sensor Networks
Accurate and Energy-Efficient Range-Free Localization for Mobile Sensor Networksambitlick
 
On Learning Navigation Behaviors for Small Mobile Robots With Reservoir Compu...
On Learning Navigation Behaviors for Small Mobile Robots With Reservoir Compu...On Learning Navigation Behaviors for Small Mobile Robots With Reservoir Compu...
On Learning Navigation Behaviors for Small Mobile Robots With Reservoir Compu...gabrielesisinna
 
Crowd Density Estimation Using Base Line Filtering
Crowd Density Estimation Using Base Line FilteringCrowd Density Estimation Using Base Line Filtering
Crowd Density Estimation Using Base Line Filteringpaperpublications3
 

Similaire à final_project_1_2k21cse07.pptx (20)

C1804011117
C1804011117C1804011117
C1804011117
 
A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural Networks
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
 
230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx
 
IRJET-Multiple Object Detection using Deep Neural Networks
IRJET-Multiple Object Detection using Deep Neural NetworksIRJET-Multiple Object Detection using Deep Neural Networks
IRJET-Multiple Object Detection using Deep Neural Networks
 
A Virtual Infrastructure for Mitigating Typical Challenges in Sensor Networks
A Virtual Infrastructure for Mitigating Typical Challenges in Sensor NetworksA Virtual Infrastructure for Mitigating Typical Challenges in Sensor Networks
A Virtual Infrastructure for Mitigating Typical Challenges in Sensor Networks
 
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
 
[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx
 
04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
A systematic image compression in the combination of linear vector quantisati...
A systematic image compression in the combination of linear vector quantisati...A systematic image compression in the combination of linear vector quantisati...
A systematic image compression in the combination of linear vector quantisati...
 
Attention correlated appearance and motion feature followed temporal learning...
Attention correlated appearance and motion feature followed temporal learning...Attention correlated appearance and motion feature followed temporal learning...
Attention correlated appearance and motion feature followed temporal learning...
 
CONVOLUTIONAL NEURAL NETWORK BASED FEATURE EXTRACTION FOR IRIS RECOGNITION
CONVOLUTIONAL NEURAL NETWORK BASED FEATURE EXTRACTION FOR IRIS RECOGNITION CONVOLUTIONAL NEURAL NETWORK BASED FEATURE EXTRACTION FOR IRIS RECOGNITION
CONVOLUTIONAL NEURAL NETWORK BASED FEATURE EXTRACTION FOR IRIS RECOGNITION
 
CONVOLUTIONAL NEURAL NETWORK BASED FEATURE EXTRACTION FOR IRIS RECOGNITION
CONVOLUTIONAL NEURAL NETWORK BASED FEATURE EXTRACTION FOR IRIS RECOGNITION CONVOLUTIONAL NEURAL NETWORK BASED FEATURE EXTRACTION FOR IRIS RECOGNITION
CONVOLUTIONAL NEURAL NETWORK BASED FEATURE EXTRACTION FOR IRIS RECOGNITION
 
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...Deep Convolutional 3D Object Classification from a Single Depth Image and Its...
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...
 
Wits presentation 6_28072015
Wits presentation 6_28072015Wits presentation 6_28072015
Wits presentation 6_28072015
 
Accurate and Energy-Efficient Range-Free Localization for Mobile Sensor Networks
Accurate and Energy-Efficient Range-Free Localization for Mobile Sensor NetworksAccurate and Energy-Efficient Range-Free Localization for Mobile Sensor Networks
Accurate and Energy-Efficient Range-Free Localization for Mobile Sensor Networks
 
On Learning Navigation Behaviors for Small Mobile Robots With Reservoir Compu...
On Learning Navigation Behaviors for Small Mobile Robots With Reservoir Compu...On Learning Navigation Behaviors for Small Mobile Robots With Reservoir Compu...
On Learning Navigation Behaviors for Small Mobile Robots With Reservoir Compu...
 
Crowd Density Estimation Using Base Line Filtering
Crowd Density Estimation Using Base Line FilteringCrowd Density Estimation Using Base Line Filtering
Crowd Density Estimation Using Base Line Filtering
 

Dernier

Contact Number Call Girls Service In Goa 9316020077 Goa Call Girls Service
Contact Number Call Girls Service In Goa  9316020077 Goa  Call Girls ServiceContact Number Call Girls Service In Goa  9316020077 Goa  Call Girls Service
Contact Number Call Girls Service In Goa 9316020077 Goa Call Girls Servicesexy call girls service in goa
 
Call Girls Magarpatta Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Magarpatta Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Magarpatta Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Magarpatta Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
VIP Call Girls Service Chaitanyapuri Hyderabad Call +91-8250192130
VIP Call Girls Service Chaitanyapuri Hyderabad Call +91-8250192130VIP Call Girls Service Chaitanyapuri Hyderabad Call +91-8250192130
VIP Call Girls Service Chaitanyapuri Hyderabad Call +91-8250192130Suhani Kapoor
 
Low Rate Call Girls Bikaner Anika 8250192130 Independent Escort Service Bikaner
Low Rate Call Girls Bikaner Anika 8250192130 Independent Escort Service BikanerLow Rate Call Girls Bikaner Anika 8250192130 Independent Escort Service Bikaner
Low Rate Call Girls Bikaner Anika 8250192130 Independent Escort Service BikanerSuhani Kapoor
 
Proposed Amendments to Chapter 15, Article X: Wetland Conservation Areas
Proposed Amendments to Chapter 15, Article X: Wetland Conservation AreasProposed Amendments to Chapter 15, Article X: Wetland Conservation Areas
Proposed Amendments to Chapter 15, Article X: Wetland Conservation Areas💥Victoria K. Colangelo
 
Freegle User Survey as visual display - BH
Freegle User Survey as visual display - BHFreegle User Survey as visual display - BH
Freegle User Survey as visual display - BHbill846304
 
VIP Call Girl Gorakhpur Aashi 8250192130 Independent Escort Service Gorakhpur
VIP Call Girl Gorakhpur Aashi 8250192130 Independent Escort Service GorakhpurVIP Call Girl Gorakhpur Aashi 8250192130 Independent Escort Service Gorakhpur
VIP Call Girl Gorakhpur Aashi 8250192130 Independent Escort Service GorakhpurSuhani Kapoor
 
Booking open Available Pune Call Girls Parvati Darshan 6297143586 Call Hot I...
Booking open Available Pune Call Girls Parvati Darshan  6297143586 Call Hot I...Booking open Available Pune Call Girls Parvati Darshan  6297143586 Call Hot I...
Booking open Available Pune Call Girls Parvati Darshan 6297143586 Call Hot I...Call Girls in Nagpur High Profile
 
Sustainable Clothing Strategies and Challenges
Sustainable Clothing Strategies and ChallengesSustainable Clothing Strategies and Challenges
Sustainable Clothing Strategies and ChallengesDr. Salem Baidas
 
VIP Call Girls Service Bandlaguda Hyderabad Call +91-8250192130
VIP Call Girls Service Bandlaguda Hyderabad Call +91-8250192130VIP Call Girls Service Bandlaguda Hyderabad Call +91-8250192130
VIP Call Girls Service Bandlaguda Hyderabad Call +91-8250192130Suhani Kapoor
 
The Most Attractive Pune Call Girls Shirwal 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Shirwal 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Shirwal 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Shirwal 8250192130 Will You Miss This Cha...ranjana rawat
 
Booking open Available Pune Call Girls Budhwar Peth 6297143586 Call Hot Indi...
Booking open Available Pune Call Girls Budhwar Peth  6297143586 Call Hot Indi...Booking open Available Pune Call Girls Budhwar Peth  6297143586 Call Hot Indi...
Booking open Available Pune Call Girls Budhwar Peth 6297143586 Call Hot Indi...Call Girls in Nagpur High Profile
 
Russian Call Girls Nashik Anjali 7001305949 Independent Escort Service Nashik
Russian Call Girls Nashik Anjali 7001305949 Independent Escort Service NashikRussian Call Girls Nashik Anjali 7001305949 Independent Escort Service Nashik
Russian Call Girls Nashik Anjali 7001305949 Independent Escort Service Nashikranjana rawat
 
(PARI) Viman Nagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune ...
(PARI) Viman Nagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune ...(PARI) Viman Nagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune ...
(PARI) Viman Nagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune ...ranjana rawat
 

Dernier (20)

Green Marketing
Green MarketingGreen Marketing
Green Marketing
 
Call Girls In Dhaula Kuan꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe
Call Girls In Dhaula Kuan꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCeCall Girls In Dhaula Kuan꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe
Call Girls In Dhaula Kuan꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe
 
Contact Number Call Girls Service In Goa 9316020077 Goa Call Girls Service
Contact Number Call Girls Service In Goa  9316020077 Goa  Call Girls ServiceContact Number Call Girls Service In Goa  9316020077 Goa  Call Girls Service
Contact Number Call Girls Service In Goa 9316020077 Goa Call Girls Service
 
Call Girls Magarpatta Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Magarpatta Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Magarpatta Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Magarpatta Call Me 7737669865 Budget Friendly No Advance Booking
 
VIP Call Girls Service Chaitanyapuri Hyderabad Call +91-8250192130
VIP Call Girls Service Chaitanyapuri Hyderabad Call +91-8250192130VIP Call Girls Service Chaitanyapuri Hyderabad Call +91-8250192130
VIP Call Girls Service Chaitanyapuri Hyderabad Call +91-8250192130
 
Call Girls In Yamuna Vihar꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe
Call Girls In Yamuna Vihar꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCeCall Girls In Yamuna Vihar꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe
Call Girls In Yamuna Vihar꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe
 
Low Rate Call Girls Bikaner Anika 8250192130 Independent Escort Service Bikaner
Low Rate Call Girls Bikaner Anika 8250192130 Independent Escort Service BikanerLow Rate Call Girls Bikaner Anika 8250192130 Independent Escort Service Bikaner
Low Rate Call Girls Bikaner Anika 8250192130 Independent Escort Service Bikaner
 
Call Girls In Pratap Nagar꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe
Call Girls In Pratap Nagar꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCeCall Girls In Pratap Nagar꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe
Call Girls In Pratap Nagar꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe
 
Proposed Amendments to Chapter 15, Article X: Wetland Conservation Areas
Proposed Amendments to Chapter 15, Article X: Wetland Conservation AreasProposed Amendments to Chapter 15, Article X: Wetland Conservation Areas
Proposed Amendments to Chapter 15, Article X: Wetland Conservation Areas
 
Freegle User Survey as visual display - BH
Freegle User Survey as visual display - BHFreegle User Survey as visual display - BH
Freegle User Survey as visual display - BH
 
Green Banking
Green Banking Green Banking
Green Banking
 
VIP Call Girl Gorakhpur Aashi 8250192130 Independent Escort Service Gorakhpur
VIP Call Girl Gorakhpur Aashi 8250192130 Independent Escort Service GorakhpurVIP Call Girl Gorakhpur Aashi 8250192130 Independent Escort Service Gorakhpur
VIP Call Girl Gorakhpur Aashi 8250192130 Independent Escort Service Gorakhpur
 
Booking open Available Pune Call Girls Parvati Darshan 6297143586 Call Hot I...
Booking open Available Pune Call Girls Parvati Darshan  6297143586 Call Hot I...Booking open Available Pune Call Girls Parvati Darshan  6297143586 Call Hot I...
Booking open Available Pune Call Girls Parvati Darshan 6297143586 Call Hot I...
 
Sustainable Clothing Strategies and Challenges
Sustainable Clothing Strategies and ChallengesSustainable Clothing Strategies and Challenges
Sustainable Clothing Strategies and Challenges
 
Gandhi Nagar (Delhi) 9953330565 Escorts, Call Girls Services
Gandhi Nagar (Delhi) 9953330565 Escorts, Call Girls ServicesGandhi Nagar (Delhi) 9953330565 Escorts, Call Girls Services
Gandhi Nagar (Delhi) 9953330565 Escorts, Call Girls Services
 
VIP Call Girls Service Bandlaguda Hyderabad Call +91-8250192130
VIP Call Girls Service Bandlaguda Hyderabad Call +91-8250192130VIP Call Girls Service Bandlaguda Hyderabad Call +91-8250192130
VIP Call Girls Service Bandlaguda Hyderabad Call +91-8250192130
 
The Most Attractive Pune Call Girls Shirwal 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Shirwal 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Shirwal 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Shirwal 8250192130 Will You Miss This Cha...
 
Booking open Available Pune Call Girls Budhwar Peth 6297143586 Call Hot Indi...
Booking open Available Pune Call Girls Budhwar Peth  6297143586 Call Hot Indi...Booking open Available Pune Call Girls Budhwar Peth  6297143586 Call Hot Indi...
Booking open Available Pune Call Girls Budhwar Peth 6297143586 Call Hot Indi...
 
Russian Call Girls Nashik Anjali 7001305949 Independent Escort Service Nashik
Russian Call Girls Nashik Anjali 7001305949 Independent Escort Service NashikRussian Call Girls Nashik Anjali 7001305949 Independent Escort Service Nashik
Russian Call Girls Nashik Anjali 7001305949 Independent Escort Service Nashik
 
(PARI) Viman Nagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune ...
(PARI) Viman Nagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune ...(PARI) Viman Nagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune ...
(PARI) Viman Nagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune ...
 

final_project_1_2k21cse07.pptx

  • 1. Human Action Recognition Using Attention Based Spatiotemporal Graph Convolutional Network Under the guidance of: Prof. Anil Singh Parihar Dept. of Computer Science & Engineering Submitted By: Anshula Sharma 2K21/CSE/07
  • 2. What is Human Action Recognition? • Human Action Recognition or (HAR) is concerned with predicting or classifying the actions being performed by a human being. • It is an important area of research. • Different data modalities used for HAR: 1. Optical Flow 2. RGB Images 3. Body Skeletons • Deep Learning techniques are used to predict and recognize human actions.
  • 3. Problem Statement • The major goal is to build and improve a model that can recognize and interpret human actions using skeletal data. • Traditional techniques frequently depend on RGB video data, which can be affected by lighting and occlusions. By capturing human actions using the spatial joints, action recognition based on skeletons provide a more robust and efficient alternative. • However, skeleton-based action identification faces a number of obstacles: • It is still difficult to extract relevant information from skeletal data and properly capture the spatiotemporal dynamics of human motions. • Existing approaches usually process the body skeletons in the entire sequence that represents the action performed. This strategy is inefficient in terms of computation time and memory utilization. • We propose an attention based spatiotemporal graph convolutional network to overcome these challenges.
  • 4. Recognition using Skeleton based data • Body skeletons are increasingly used for Human action recognition due to their compact and action-focused nature. • Skeletons are three-dimensional or two- dimensional coordinate representations of human body joints. • Skeletons are found in graph formations, where the graph’s nodes represents the skeleton's joints whereas the edges of the graph indicate the many connections between various body joints. • Actions can be identified from the different motion patterns of the joints of the skeletal body.
  • 5. Graph Convolutional Networks • Graph Convolutional Networks (GCNs) have become quite prominent in the field of skeleton-based action recognition. • Using deep feed-forward architectures, graph convolutional networks successfully capture the spatiotemporal characteristics inherent in human skeletons. • GCNs are a variant of Convolutional Neural Networks (CNN), and help to generalize graph- structured data. • GCNs operate, in a manner similar to CNNs, by inspecting the neighboring nodes. • The input is in non-Euclidean structural form data, with each node having varying numbers of connections. • Nodes and their connections (edges) with other nodes are represented with the help of an adjacency matrix, which is then introduced to the forward propagation equation.
  • 6. Graph Convolutional Network • The initial input to the GCN is the graph's node features along with the adjacency matrix. It records local connection patterns as well as information from surrounding nodes. • The aggregated node features are applied with the weight matrix. It is multiplied by the aggregated features to compute the modified features for each node. • Finally, non-linear activation function is applied to obtain updated node representations. The updated node representations are then served as the next layer’s input, along with the adjacency matrix.
  • 8. Dataset • The model is experimented on a large-scale indoor action recognition dataset, NTU-RGB+D. • It is the most comprehensive datasets for 3D annotations for human action recognition tasks. • The video clips have been acquired by Microsoft Kinect v2 sensors. • It covers 60 action classes covering over 56,880 action samples, with the actions carried out by 40 distinct test subjects. • The action classes cover a wide variety of daily actions, including walking, waving, clapping, sitting down, getting up, and playing musical instruments. It consists of both individual actions and interactions between two subjects. • The annotations that are obtained by the Kinect depth sensors offer 3D joint positions (X, Y, Z). • A total of 25 joints are present in each subject in the skeleton series. • The dataset consists of two evaluation benchmarks: 1. Cross-view (X-view): 3 different cameras capture the videos. The training set comprises of 37,820 video clips. The test set consists of 18,960 video clips. 2. Cross-subject (X-sub): Focuses on cross-subject action recognition. The training set has 40,320 videos and the test set contains 16,560 videos.
  • 10. Proposed Approach • The proposed approach is an attention-based model for human action recognition that uses both temporal and spatial attention modules to improve recognition accuracy. • The temporal attention module selects the most informative frames from a sequence of skeletons, capturing the action's critical temporal dynamics. • Following that, the spatial attention mechanism highlights the most significant joints within the selected frames, emphasizing their distinguishing characteristics. • The computed attention scores are then used to select frames, allowing the identification of the skeletons with the highest attention values. • Both temporal and spatial relationships are effectively utilized in the skeletal data by incorporating the attention modules into a graph convolutional network. • The temporal and spatial attention mechanisms together improve the efficiency of human action recognition based on skeletons, resulting in further accurate and robust identification results.
  • 11. Spatial Graph Convolution Module • Spatial Graph Convolution block focuses on capturing spatial relationships. • It operates by considering the nodes in the network as skeletal joints and the edges as connections that describe interactions between these joints. • The block aggregates information from nearby nodes in order to capture local interactions and dependencies. • The block accepts a tensor of shape (N, Cin, T, V) as input, where N denotes the batch size, T represents the sequence length, Cin represents the number of input channels, and V represents the number of joints. • Along with the input tensor, an adjacency matrix is used which depicts pairwise interactions between the joints. • The block then runs a graph convolution operation, which updates the characteristics of each joint by aggregating information from its nearby joints. • Non-linear transformations are used after the graph convolution function to incorporate non-linearity and improve discriminative capability of the features.
  • 12. Temporal Graph Convolution Module • Temporal convolution block captures the dynamics of temporal information in the skeletal data. • The block takes as input the output of the spatial graph convolution block. • For temporal graph convolution, the neighborhood of each vertex is extended to include each node's temporal neighbors. • Each node in the preceding and following skeleton frames is linked to the same node, resulting in a temporal neighborhood size of 2 for each node. • A 2D convolution is performed to process the extended neighborhood, with a kernel which determines the convolution operation's temporal receptive field. • The temporal convolution block aggregates the features of each individual body joint at various time steps by using a fixed kernel size, allowing the model to capture the dynamics of temporal dimension and patterns contained in the skeletal data.
  • 13. Temporal Attention Block • Temporal attention is used to detect relevant frames within a sequence of frames. • Its goal is to emphasize and recognize the frames that make a significant contribution to the overall interpretation of an action. • The temporal attention module computes the average activation over all joints and channels for each frame. • It aggregates the collective information inside each frame, indicating its overall significance. • The model learns the weights associated with each frame by passing the aggregated frame-level activations through a linear layer. • A sigmoid activation function is then applied to these weights which produces attention weights for each frame. • These attention weights serve as a mask, selectively amplifying or suppressing specific frame activations. • The model can successfully focus on the frames that are deemed significant or informative for the given action by applying attention weights to the input.
  • 14. Spatial Attention Block • Spatial attention is concerned with identifying important joints within the frames highlighted in the temporal attention module. • It computes the average activation for each joint over all frames and channels. • The joint-level activations are then transferred to a linear layer, which allows the model to learn the weights associated with each joint. • These weights are then fed into a sigmoid activation function, which produces attention weights that indicate the relative significance of each joint. • The generated attention weights operate as a mask, selectively adjusting the activations of individual joints based on their relevance by adding attention weights to the attended frames. • Finally, a subset of most informative skeletons from a given sequence of skeletons is fetched. • The skeletons are sorted in a decreasing order according to their attention weights. • These selected skeletons are then incorporated into the network's subsequent layers for additional processing and analysis.
  • 15. Network Architecture • Skeleton data is fed as the input to the model comprising of the coordinate locations of all the joints of the skeleton. • The model is made up of six graph convolutional layers and an attention module. • Initially, two spatial graph convolutional layers are present in the model. Spatial Graph Convolution layer focuses on capturing spatial relationships. • Temporal and Spatial attention blocks are present after the spatial graph convolutional layers. The temporal attention module captures the most informative frames from a sequence of skeletons. The spatial attention mechanism emphasizes the most informative joints from the frames highlighted. Frame selection is then performed to select the skeletons with the highest attention scores. • The last four graph convolutional layers combine spatial and temporal convolutions. Temporal dynamics in the skeletal data is captured by the temporal convolution block. The block takes as input the output of the spatial graph convolution block. • Each skeleton sequence's enhanced spatiotemporal characteristics are passed through a global average pooling, yielding a 256-dimensional output feature vector. • Finally, human actions are classified using a fully connected layer that includes a SoftMax classifier. To reduce classification error, the model trains end-to-end via backpropagation.
  • 17. Experimental Results • The model achieved Top-1 and Top-5 accuracy of 83.58% and 97.07% on the cross-subject benchmark. While a Top- 1 and Top-5 accuracy of 91.22% and 98.85% is achieved on the cross-view benchmark. • Model is compared with other notable approaches on the Top-1 accuracy for both cross-subject and cross-view benchmarks.
  • 18. Conclusion • Utilization of temporal and spatial attention mechanisms helped in enhancing the model’s performance. • The model is evaluated on a widely used NTU-RGB+D benchmark, assessing its top-1 and top-5 accuracies on the dataset’s cross-view and cross-subject benchmarks. • In comparison to other skeleton-based models, we observed a significant difference in performance between RNN and CNN-based methods and our method. Furthermore, our method outperformed other GCN-based models, demonstrating the benefit of incorporating spatial and temporal attention mechanisms to graph convolutional network which enhanced the model's accuracy and efficiency.
  • 19. References • S. Yan, Y. Xiong and D. Lin, "Spatial temporal graph convolutional networks for skeleton-based action recognition," in Proceedings of the AAAI conference on artificial intelligence, 2018. • N. Heidari and A. Iosifidis, "Temporal attention-augmented graph convolutional network for efficient skeleton-based human action recognition," in 2020 25th International Conference on Pattern Recognition (ICPR), 2021. • T. N. Kipf and M. Welling, ‘Semi-Supervised Classification with Graph Convolutional Networks’, in International Conference on Learning Representations, 2017. • A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar and L. Fei-Fei, "Large-scale video classification with convolutional neural networks," 2014. • Y. Du, W. Wang and L. Wang, "Hierarchical recurrent neural network for skeleton based action recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015.