Neuro-fuzzy implementation on object detectors for whole image classification / Basyir Adam

Open-source deep learning tools have been distributed numerously and have gained popularity in the past decades. The contribution of deep learning is object recognition which can provide advantages on automated image organization, stock image and video, image management, advertising, interactive mar...

Full description

Saved in:
Bibliographic Details
Main Author: Adam, Basyir
Format: Thesis
Language:English
Published: 2020
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/59575/1/59575.pdf
https://ir.uitm.edu.my/id/eprint/59575/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Open-source deep learning tools have been distributed numerously and have gained popularity in the past decades. The contribution of deep learning is object recognition which can provide advantages on automated image organization, stock image and video, image management, advertising, interactive marketing and creative campaigns, image recognition, etc. Although many region-based object detectors are running on GPUs, not many comparisons are made and analysed thoroughly between those GPUs with the latest technology in the market. GPU can speed-up the computational power of the neural network a hundred times faster than a regular CPU. The first experiment will be finding the most suitable GPU and the optimized parameters for region-based object detector based on the best mean average precision (mAP) obtained. Faster Region-based Convolutional Neural Network (R-CNN) is one of the fastest region-based object detectors that replaces its previous proposal method with Region Proposal Network (RPN) to complete the network. Another fast alternative method is the You Only Look Once (YOLO) which directly predicts class probabilities and bounding boxes in a single evaluation from a full image. The second experiment will be performance evaluation of Faster R-CNN and YOLOv2 by optimizing different parameters using the best GPU obtained which is GTX 1080 from the previous experiment. The current limitation of object recognition is object detectors only detect objects in an image without classifying the whole image. Thus, the third experiment will be implementing a neuro-fuzzy system with Faster R-CNN and YOLOv2 for whole image classification using the best GPU and optimized parameters obtained from previous experiments. Faster R-CNN is written in Matrix Laboratory (MATLAB) with Caffe network while YOLOv2 is written in C and Python programming language with Darknet network. The parameters that are observed are minimum batch size, maximum pixel size and image scales. The benchmarks of the performance are run using three GPUs platform of GeForce GTX 10 series on PASCAL VOC 2007 dataset. The hybrid systems are evaluated in terms of accuracy performance of the whole image classification on a custom dataset. The results for benchmarking Faster R-CNN using GTX 1060 (3GB) is 60.2%, GTX 1070 is 59.8% and GTX 1080 is 60.3% using optimized parameters. The best result obtained for YOLOv2 is 75.0% of mAP using optimized parameters. The average accuracy of the proposed neuro-fuzzy system using YOLOv2 for whole image classification is 81% for top-1 accuracy and 96% for top-3 accuracy. It outperforms Faster R-CNN with the VGG-16 network which is 67% for top-1 accuracy and 96% for top-3 accuracy as well as Faster R-CNN with ZF network which is 66% for top-1 accuracy and 94% for top-3 accuracy.