Staff View: Spatio-temporal normalized joint coordinates as features for skeleton-based human action recognition

Spatio-temporal normalized joint coordinates as features for skeleton-based human action recognition

Human Action Recognition (HAR) is critical in video monitoring, human-computer interaction, video comprehension, and virtual reality. While significant progress has been made in the HAR domain in recent years, developing an accurate, fast, and efficient system for video action recognition remains a...

Full description

Saved in:

Bibliographic Details
Main Author:	Nasrul ‘Alam, Fakhrul Aniq Hakimi
Format:	Thesis
Language:	English
Published:	2022
Subjects:	T Technology (General)
Online Access:	http://eprints.utm.my/id/eprint/99599/1/FakhrulAniqHakimiMMJIIT2022.pdf http://eprints.utm.my/id/eprint/99599/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:150862
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.utm.99599
record_format	eprints
spelling	my.utm.995992023-03-05T08:24:04Z http://eprints.utm.my/id/eprint/99599/ Spatio-temporal normalized joint coordinates as features for skeleton-based human action recognition Nasrul ‘Alam, Fakhrul Aniq Hakimi T Technology (General) Human Action Recognition (HAR) is critical in video monitoring, human-computer interaction, video comprehension, and virtual reality. While significant progress has been made in the HAR domain in recent years, developing an accurate, fast, and efficient system for video action recognition remains a challenge due to a variety of obstacles, such as changes in camera viewpoint, occlusions, background, and motion speed. In general, the action recognition model learns spatial and temporal features in order to classify human actions. The state-of-the-art approaches to deep learning skeleton-based action recognition rely primarily on Recurrent Neural Networks (RNN) or Convolutional Neural Networks (CNN). RNN-based action recognition methods only model the long-term contextual information in the temporal domain. In return, they neglect the spatial configurations of articulated skeletons where the joints are strongly discriminative. Therefore, it is challenging to extract high-level features. In contrast, action recognition based on CNNs is incapable of modelling long-term temporal dependency. Typically, implementations stack a limited number of frames and convert them into images to represent spatio-temporal information. However, this approach is susceptible to information loss during the conversion process. This study proposes STEM-Coords as pre-processing and features extraction technique, to effectively represent spatio-temporal features using joint coordinates from a human pose. The feature set comprised normalized joint coordinates and their respective speed was represented tabularly as input for the Neural Oblivious Decision Ensemble (NODE) classification model. The proposed STEM-Coords was validated on three benchmark datasets KTH, RealWorld HAR, and MSR DailyActivity 3D. Our method outperformed the state-of-the-art approaches on every dataset with 97.3%, 99.3%, and 97.4% accuracy rates, respectively. The results demonstrated that our proposed method effectively and efficiently represents spatio-temporal information while maintaining robustness to partial occlusion, anthropometrically, and view-invariant. 2022 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/99599/1/FakhrulAniqHakimiMMJIIT2022.pdf Nasrul ‘Alam, Fakhrul Aniq Hakimi (2022) Spatio-temporal normalized joint coordinates as features for skeleton-based human action recognition. Masters thesis, Universiti Teknologi Malaysia. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:150862
institution	Universiti Teknologi Malaysia
building	UTM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Malaysia
content_source	UTM Institutional Repository
url_provider	http://eprints.utm.my/
language	English
topic	T Technology (General)
spellingShingle	T Technology (General) Nasrul ‘Alam, Fakhrul Aniq Hakimi Spatio-temporal normalized joint coordinates as features for skeleton-based human action recognition
description	Human Action Recognition (HAR) is critical in video monitoring, human-computer interaction, video comprehension, and virtual reality. While significant progress has been made in the HAR domain in recent years, developing an accurate, fast, and efficient system for video action recognition remains a challenge due to a variety of obstacles, such as changes in camera viewpoint, occlusions, background, and motion speed. In general, the action recognition model learns spatial and temporal features in order to classify human actions. The state-of-the-art approaches to deep learning skeleton-based action recognition rely primarily on Recurrent Neural Networks (RNN) or Convolutional Neural Networks (CNN). RNN-based action recognition methods only model the long-term contextual information in the temporal domain. In return, they neglect the spatial configurations of articulated skeletons where the joints are strongly discriminative. Therefore, it is challenging to extract high-level features. In contrast, action recognition based on CNNs is incapable of modelling long-term temporal dependency. Typically, implementations stack a limited number of frames and convert them into images to represent spatio-temporal information. However, this approach is susceptible to information loss during the conversion process. This study proposes STEM-Coords as pre-processing and features extraction technique, to effectively represent spatio-temporal features using joint coordinates from a human pose. The feature set comprised normalized joint coordinates and their respective speed was represented tabularly as input for the Neural Oblivious Decision Ensemble (NODE) classification model. The proposed STEM-Coords was validated on three benchmark datasets KTH, RealWorld HAR, and MSR DailyActivity 3D. Our method outperformed the state-of-the-art approaches on every dataset with 97.3%, 99.3%, and 97.4% accuracy rates, respectively. The results demonstrated that our proposed method effectively and efficiently represents spatio-temporal information while maintaining robustness to partial occlusion, anthropometrically, and view-invariant.
format	Thesis
author	Nasrul ‘Alam, Fakhrul Aniq Hakimi
author_facet	Nasrul ‘Alam, Fakhrul Aniq Hakimi
author_sort	Nasrul ‘Alam, Fakhrul Aniq Hakimi
title	Spatio-temporal normalized joint coordinates as features for skeleton-based human action recognition
title_short	Spatio-temporal normalized joint coordinates as features for skeleton-based human action recognition
title_full	Spatio-temporal normalized joint coordinates as features for skeleton-based human action recognition
title_fullStr	Spatio-temporal normalized joint coordinates as features for skeleton-based human action recognition
title_full_unstemmed	Spatio-temporal normalized joint coordinates as features for skeleton-based human action recognition
title_sort	spatio-temporal normalized joint coordinates as features for skeleton-based human action recognition
publishDate	2022
url	http://eprints.utm.my/id/eprint/99599/1/FakhrulAniqHakimiMMJIIT2022.pdf http://eprints.utm.my/id/eprint/99599/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:150862
_version_	1759689425180164096
score	13.211869

Spatio-temporal normalized joint coordinates as features for skeleton-based human action recognition

Similar Items