A deep Spatio-temporal network for vision-based sexual harassment detection
Smart surveillance systems can play a significant role in detecting sexual harassment in real-time for law enforcement which can reduce the sexual harassment activities. Real-time detecting of sexual harassment from video is a complex computer vision because of various factors such as clothing or ca...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Conference or Workshop Item |
Language: | English English |
Published: |
Institute of Electrical and Electronics Engineers Inc.
2021
|
Subjects: | |
Online Access: | http://umpir.ump.edu.my/id/eprint/42383/1/A%20deep%20Spatio-temporal%20network%20for%20vision-based%20sexual.pdf http://umpir.ump.edu.my/id/eprint/42383/2/A%20deep%20Spatio-temporal%20network%20for%20vision-based%20sexual%20harassment%20detection_ABS.pdf http://umpir.ump.edu.my/id/eprint/42383/ https://doi.org/10.1109/ETCCE54784.2021.9689891 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Smart surveillance systems can play a significant role in detecting sexual harassment in real-time for law enforcement which can reduce the sexual harassment activities. Real-time detecting of sexual harassment from video is a complex computer vision because of various factors such as clothing or carrying variation, illumination variation, partial occlusion, low resolution, view angle variation etc. Due to the advancement of convolutional neural networks (CNNs) and Long Short-Term Memory (LSTM), human action recognition tasks have achieved great success in recent years. But sexual harassment detection is addressed due to presences of large-scale harassment dataset. In this work, to address this problem, we build a video dataset of sexual harassment, namely Sexual harassment video (SHV) dataset which consists of harassment and non-harassment videos collected from YouTube. Besides, we build a CNN-LSTM network to detect the sexual harassment in which CNN and RNN are employed for extracting spatial features and temporal features, respectively. State-of-the-art pretrained models are also employed as a spatial feature extractor with an LSTM and three dense layer to classify harassment activities. Moreover, to find the robustness of our proposed model, we have conducted several experiments with our proposed method on two other benchmark datasets, such as Hockey Fight dataset and Movie Violence dataset and achieved state-of-the-art accuracy. |
---|