A deep Spatio-temporal network for vision-based sexual harassment detection

Smart surveillance systems can play a significant role in detecting sexual harassment in real-time for law enforcement which can reduce the sexual harassment activities. Real-time detecting of sexual harassment from video is a complex computer vision because of various factors such as clothing or ca...

Full description

Saved in:
Bibliographic Details
Main Authors: Islam, Md Shamimul, Hasan, Md Mahedi, Abdullah, Sohaib, Md Akbar, Jalal Uddin, Arafat, N. H.M., Murad, Saydul Akbar
Format: Conference or Workshop Item
Language:English
English
Published: Institute of Electrical and Electronics Engineers Inc. 2021
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/42383/1/A%20deep%20Spatio-temporal%20network%20for%20vision-based%20sexual.pdf
http://umpir.ump.edu.my/id/eprint/42383/2/A%20deep%20Spatio-temporal%20network%20for%20vision-based%20sexual%20harassment%20detection_ABS.pdf
http://umpir.ump.edu.my/id/eprint/42383/
https://doi.org/10.1109/ETCCE54784.2021.9689891
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Smart surveillance systems can play a significant role in detecting sexual harassment in real-time for law enforcement which can reduce the sexual harassment activities. Real-time detecting of sexual harassment from video is a complex computer vision because of various factors such as clothing or carrying variation, illumination variation, partial occlusion, low resolution, view angle variation etc. Due to the advancement of convolutional neural networks (CNNs) and Long Short-Term Memory (LSTM), human action recognition tasks have achieved great success in recent years. But sexual harassment detection is addressed due to presences of large-scale harassment dataset. In this work, to address this problem, we build a video dataset of sexual harassment, namely Sexual harassment video (SHV) dataset which consists of harassment and non-harassment videos collected from YouTube. Besides, we build a CNN-LSTM network to detect the sexual harassment in which CNN and RNN are employed for extracting spatial features and temporal features, respectively. State-of-the-art pretrained models are also employed as a spatial feature extractor with an LSTM and three dense layer to classify harassment activities. Moreover, to find the robustness of our proposed model, we have conducted several experiments with our proposed method on two other benchmark datasets, such as Hockey Fight dataset and Movie Violence dataset and achieved state-of-the-art accuracy.