Computer vision that can ‘see’ in the dark

Insufficient lighting environment has raised challenges for night shift workers’ safety monitoring. Thus, we have developed a computer vision-based algorithm recognizing 11 actions based on action recognition in dark (ARID) dataset. A hybrid model of integrating convolutional neural network (CNN) i...

Full description

Saved in:
Bibliographic Details
Main Authors: Wong, Yan Chiew, Ahmad Radzi, Syafeeza, Goh, Shi Yong, Sarban Singh, Ranjit Singh
Format: Article
Language:en
Published: Institute of Advanced Engineering and Science 2024
Online Access:http://eprints.utem.edu.my/id/eprint/28737/2/01298150420251318331746.pdf
http://eprints.utem.edu.my/id/eprint/28737/
https://ijai.iaescore.com/index.php/IJAI/article/view/24211
http://doi.org/10.11591/ijai.v13.i3.pp2883-2892
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Insufficient lighting environment has raised challenges for night shift workers’ safety monitoring. Thus, we have developed a computer vision-based algorithm recognizing 11 actions based on action recognition in dark (ARID) dataset. A hybrid model of integrating convolutional neural network (CNN) into YOLOv7 has been proposed. YOLOv7 is an algorithm designed for real-time object detection in image or video, for fast and accurate detection in applications such as autonomous vehicles and surveillance systems. In this work, video in dark environment has first been enhanced using CNN algorithm before feeding into YOLOv7 network for activity recognition. Adaptive gamma intensity correction (GIC) has been integrated to further improving the overall result. The proposed model has been evaluated over different enhancement modes. The proposed model is able to handle dark video frames with 74.95% Top-1 accuracy with fast processing speed of 93.99 ms/frame on a 4 GB RTX 3050 graphical processing unit (GPU) and 17.59 ms/frame on 16 GB Tesla T4 GPU. The base size of the proposed model is tiny, only 74.8 MB, but with 36.54 M of total parameters indicating that it has more capacity to learn more meaningful information with limited hardware resources.