Shape-based multi-view human action recognition using distance-based-matrix-regardless-of-row-priority classifier
The recognition of human activity (or action) in videos has elicited significant attention in recent years given its potential use in many real-life applications. Human Action Recognition (HAR) is typically applied in fields such as human–computer interaction, surveillance, content-based video re...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2016
|
Online Access: | http://psasir.upm.edu.my/id/eprint/69314/1/FSKTM%202016%205%20IR.pdf http://psasir.upm.edu.my/id/eprint/69314/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The recognition of human activity (or action) in videos has elicited significant
attention in recent years given its potential use in many real-life applications. Human
Action Recognition (HAR) is typically applied in fields such as human–computer
interaction, surveillance, content-based video retrieval, and sports event analysis.
HAR is a complex process because characteristics such as gender, height, body
shape, and age considerably affect the visual reproduction/representation of captured
actions. In practical applications, changes in viewpoint are common and
fundamentally unavoidable given the inherent limitations of camera technology or
the inevitable dynamism of human motion. When such changes are implemented, the
recognition rate of current HAR approaches dramatically decreases. This problem is
typically mitigated by the use of cameras equipped with multiple fields of views,
which provide richer information than that derived from single-view cameras. Even
with such innovations, nonetheless, ensuring accurate correlation and acquiring
multi-view learning data remain complicated challenges.
This work proposes four methods to advance the field of HAR. The Shape-based
features are extracted from frames silhouette by using proposed Global Silhouette
Shape Representation (GSSR) method. This GSSR is suitable given that silhouettes
present spatial information on actions over time. Concatenation, as a data fusion
technique, is also applied to create a multi-view feature vector from a combination of
single-view feature vectors. In other words, a matrix of multi-view features is
generated for each action. Maximum-Distance-among-Feature-Vectors (MDFV)
technique, as a frame selection method, is employed to choose a subset of frames (or
feature vectors) with the maximum difference among them. This strategy is based on
the removal of frames with mostly similar features. Relevant and suitable features
are selected using Binary Particle Swarm Optimization (BPSO) technique. This
research likewise develops a Distance-based-Matrix-Regardless-of-Row-Priority
(DMRRP) classifier, which is driven by the idea that if two action sequences depict
motion performed by the same or different individuals, then the sum (or mean) of the
minimum distances between each individual frame of sequence 1 and all the frames of sequence 2 reflects the similarity between the two actions. This classifier can
recognize actions captured from different views.
Finally, this study evaluates the performance of a proposed Multi-View Human
Action Recognition Based On Shape-Based Feature Extraction and Distance-Based
Classifier (MHARSD) in single- and multi-view HAR. To evaluate this approach, an
experiment that involves two publicly available multi-view HAR datasets (i.e.,
MuHAVi and IXMAS) is conducted to determine the quality of recognition that the
method produces for different actions. MHARSD supports the recognition of a wide
range of human actions. In all evaluations, it exhibits a recognition accuracy higher
than that achieved by 2D multi-view HAR state-of-the-art methods. |
---|