Human action interpretation using convolutional neural network: a survey
Human action interpretation (HAI) is one of the trending domains in the era of computer vision. It can further be divided into human action recognition (HAR) and human action detection (HAD). The HAR analyzes frames and provides label(s) to overall video, whereas the HAD localizes actor first, in ea...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Published: |
Springer Science and Business Media Deutschland GmbH
2022
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/102763/ http://dx.doi.org/10.1007/s00138-022-01291-0 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utm.102763 |
---|---|
record_format |
eprints |
spelling |
my.utm.1027632023-09-24T03:06:39Z http://eprints.utm.my/id/eprint/102763/ Human action interpretation using convolutional neural network: a survey Malik, Zainab Shapiai, Mohd. Ibrahim QA75 Electronic computers. Computer science TK Electrical engineering. Electronics Nuclear engineering Human action interpretation (HAI) is one of the trending domains in the era of computer vision. It can further be divided into human action recognition (HAR) and human action detection (HAD). The HAR analyzes frames and provides label(s) to overall video, whereas the HAD localizes actor first, in each frame, and then estimates the action score for the detected region. The effectiveness of a HAI model is highly dependent on the representation of spatiotemporal features and the model’s architectural design. For the effective representation of these features, various studies have been carried out. Moreover, to better learn these features and to get the action score on the basis of these features, different designs of deep architectures have also been proposed. Among various deep architectures, convolutional neural network (CNN) is relatively more explored for HAI due to its lesser computational cost. To provide overview of these efforts, various surveys have been published to date; however, none of these surveys is focusing the features’ representation and design of proposed architectures in detail. Secondly, none of these studies is focusing the pose assisted HAI techniques. This study provides a more detailed survey on existing CNN-based HAI techniques by incorporating the frame level as well as pose level spatiotemporal features-based techniques. Besides these, it offers comparative study on different publicly available datasets used to evaluate HAI models based on various spatiotemporal features’ representations. Furthermore, it also discusses the limitations and challenges of the HAI and concludes that human action interpretation from visual data is still very far from the actual interpretation of human action in realistic videos which are continuous in nature and may contain multiple human beings performing multiple actions sequentially or in parallel. Springer Science and Business Media Deutschland GmbH 2022-05 Article PeerReviewed Malik, Zainab and Shapiai, Mohd. Ibrahim (2022) Human action interpretation using convolutional neural network: a survey. Machine Vision and Applications, 33 (3). pp. 1-23. ISSN 0932-8092 http://dx.doi.org/10.1007/s00138-022-01291-0 DOI:10.1007/s00138-022-01291-0 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
topic |
QA75 Electronic computers. Computer science TK Electrical engineering. Electronics Nuclear engineering |
spellingShingle |
QA75 Electronic computers. Computer science TK Electrical engineering. Electronics Nuclear engineering Malik, Zainab Shapiai, Mohd. Ibrahim Human action interpretation using convolutional neural network: a survey |
description |
Human action interpretation (HAI) is one of the trending domains in the era of computer vision. It can further be divided into human action recognition (HAR) and human action detection (HAD). The HAR analyzes frames and provides label(s) to overall video, whereas the HAD localizes actor first, in each frame, and then estimates the action score for the detected region. The effectiveness of a HAI model is highly dependent on the representation of spatiotemporal features and the model’s architectural design. For the effective representation of these features, various studies have been carried out. Moreover, to better learn these features and to get the action score on the basis of these features, different designs of deep architectures have also been proposed. Among various deep architectures, convolutional neural network (CNN) is relatively more explored for HAI due to its lesser computational cost. To provide overview of these efforts, various surveys have been published to date; however, none of these surveys is focusing the features’ representation and design of proposed architectures in detail. Secondly, none of these studies is focusing the pose assisted HAI techniques. This study provides a more detailed survey on existing CNN-based HAI techniques by incorporating the frame level as well as pose level spatiotemporal features-based techniques. Besides these, it offers comparative study on different publicly available datasets used to evaluate HAI models based on various spatiotemporal features’ representations. Furthermore, it also discusses the limitations and challenges of the HAI and concludes that human action interpretation from visual data is still very far from the actual interpretation of human action in realistic videos which are continuous in nature and may contain multiple human beings performing multiple actions sequentially or in parallel. |
format |
Article |
author |
Malik, Zainab Shapiai, Mohd. Ibrahim |
author_facet |
Malik, Zainab Shapiai, Mohd. Ibrahim |
author_sort |
Malik, Zainab |
title |
Human action interpretation using convolutional neural network: a survey |
title_short |
Human action interpretation using convolutional neural network: a survey |
title_full |
Human action interpretation using convolutional neural network: a survey |
title_fullStr |
Human action interpretation using convolutional neural network: a survey |
title_full_unstemmed |
Human action interpretation using convolutional neural network: a survey |
title_sort |
human action interpretation using convolutional neural network: a survey |
publisher |
Springer Science and Business Media Deutschland GmbH |
publishDate |
2022 |
url |
http://eprints.utm.my/id/eprint/102763/ http://dx.doi.org/10.1007/s00138-022-01291-0 |
_version_ |
1778160777920774144 |
score |
13.211869 |