Hybrid sampling and random forest machine learning approach for software detect prediction

The software has turn into an imperious part of human’s life. In the recent computing era, many large-scale complex network systems and millions of modern technological devices produce a huge amount of data every second. Among these data, the amount of imbalanced data is relatively excessive. The ma...

Full description

Saved in:

Bibliographic Details
Main Authors:	Md. Anwar, Hossen, Md. Shariful, Islam, Nurhafizah, Abu Talip, Md. Sakib, Rahman, Fatema, Siddika, Mostafijur, Rahman, Sabira, Khatun, Mohamad Shaiful, Abdul Karim, S. M, Hasan Mahmud
Format:	Conference or Workshop Item
Language:	en en
Published:	2019
Subjects:	TK Electrical engineering. Electronics Nuclear engineering
Online Access:	http://umpir.ump.edu.my/id/eprint/26687/1/42.%20Hybrid%20sampling%20and%20random%20forest%20machine%20learning.pdf http://umpir.ump.edu.my/id/eprint/26687/2/42.1%20Hybrid%20sampling%20and%20random%20forest%20machine%20learning.pdf http://umpir.ump.edu.my/id/eprint/26687/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1831526826011262976
author	Md. Anwar, Hossen Md. Shariful, Islam Nurhafizah, Abu Talip Md. Sakib, Rahman Fatema, Siddika Mostafijur, Rahman Sabira, Khatun Mohamad Shaiful, Abdul Karim S. M, Hasan Mahmud
author_facet	Md. Anwar, Hossen Md. Shariful, Islam Nurhafizah, Abu Talip Md. Sakib, Rahman Fatema, Siddika Mostafijur, Rahman Sabira, Khatun Mohamad Shaiful, Abdul Karim S. M, Hasan Mahmud
author_sort	Md. Anwar, Hossen
building	UMPSA Library
collection	Institutional Repository
content_provider	Universiti Malaysia Pahang Al-Sultan Abdullah
content_source	UMPSA Institutional Repository
continent	Asia
country	Malaysia
description	The software has turn into an imperious part of human’s life. In the recent computing era, many large-scale complex network systems and millions of modern technological devices produce a huge amount of data every second. Among these data, the amount of imbalanced data is relatively excessive. The machine learning model is miss leaded by these imbalanced data. Software Defect Prediction (SDP) is a standout amongst the most helping exercises during the testing phase. The estimated cost of finding and fixing defects is approximately billions of pounds per year. To reduce this problem, software defect prediction has come forth but need fine tuning to have expected efficiency. In this chapter, we have proposed a new model based on machine learning approach to predict software defect and identify the key factors that may help the software engineer to identify the most defect-prone part of the system. The proposed model works as follows. First, need to remove highly correlated features and turn all the feature in the same scale using the scaling feature approach. Second, we have used Synthetic Minority Over-sampling Technique (SMOTE), Adaptive Synthetic (ADASYN) and Hybrid sampling method to balance highly imbalanced datasets. Third, Random Forest Importance and Chi-square algorithms are chosen to find out the factors which have high effect on software defect. Cross validation is used to remove overriding problem. Scikit-learn library is used for machine learning algorithms. Pandas library is used for data processing. Matplotlib, and PyPlot are used for graph and data visualization respectively. The hybrid sampling method and Random Forest (RF) algorithms achieved the highest prediction accuracy about 93.26% by showing its superiority.
format	Conference or Workshop Item
id	my.ump.umpir.26687
institution	Universiti Malaysia Pahang
language	en en
publishDate	2019
record_format	eprints
spelling	my.ump.umpir.266872020-02-13T02:20:35Z http://umpir.ump.edu.my/id/eprint/26687/ Hybrid sampling and random forest machine learning approach for software detect prediction Md. Anwar, Hossen Md. Shariful, Islam Nurhafizah, Abu Talip Md. Sakib, Rahman Fatema, Siddika Mostafijur, Rahman Sabira, Khatun Mohamad Shaiful, Abdul Karim S. M, Hasan Mahmud TK Electrical engineering. Electronics Nuclear engineering The software has turn into an imperious part of human’s life. In the recent computing era, many large-scale complex network systems and millions of modern technological devices produce a huge amount of data every second. Among these data, the amount of imbalanced data is relatively excessive. The machine learning model is miss leaded by these imbalanced data. Software Defect Prediction (SDP) is a standout amongst the most helping exercises during the testing phase. The estimated cost of finding and fixing defects is approximately billions of pounds per year. To reduce this problem, software defect prediction has come forth but need fine tuning to have expected efficiency. In this chapter, we have proposed a new model based on machine learning approach to predict software defect and identify the key factors that may help the software engineer to identify the most defect-prone part of the system. The proposed model works as follows. First, need to remove highly correlated features and turn all the feature in the same scale using the scaling feature approach. Second, we have used Synthetic Minority Over-sampling Technique (SMOTE), Adaptive Synthetic (ADASYN) and Hybrid sampling method to balance highly imbalanced datasets. Third, Random Forest Importance and Chi-square algorithms are chosen to find out the factors which have high effect on software defect. Cross validation is used to remove overriding problem. Scikit-learn library is used for machine learning algorithms. Pandas library is used for data processing. Matplotlib, and PyPlot are used for graph and data visualization respectively. The hybrid sampling method and Random Forest (RF) algorithms achieved the highest prediction accuracy about 93.26% by showing its superiority. 2019 Conference or Workshop Item PeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/26687/1/42.%20Hybrid%20sampling%20and%20random%20forest%20machine%20learning.pdf pdf en http://umpir.ump.edu.my/id/eprint/26687/2/42.1%20Hybrid%20sampling%20and%20random%20forest%20machine%20learning.pdf Md. Anwar, Hossen and Md. Shariful, Islam and Nurhafizah, Abu Talip and Md. Sakib, Rahman and Fatema, Siddika and Mostafijur, Rahman and Sabira, Khatun and Mohamad Shaiful, Abdul Karim and S. M, Hasan Mahmud (2019) Hybrid sampling and random forest machine learning approach for software detect prediction. In: 5th International Conference on Electrical, Control and Computer Engineering (INECCE 2019) , 29 - 30 Julai 2019 , Swiss-Garden Beach Resort, Kuantan, Pahang. pp. 1-12.. (Unpublished) (Unpublished)
spellingShingle	TK Electrical engineering. Electronics Nuclear engineering Md. Anwar, Hossen Md. Shariful, Islam Nurhafizah, Abu Talip Md. Sakib, Rahman Fatema, Siddika Mostafijur, Rahman Sabira, Khatun Mohamad Shaiful, Abdul Karim S. M, Hasan Mahmud Hybrid sampling and random forest machine learning approach for software detect prediction
title	Hybrid sampling and random forest machine learning approach for software detect prediction
title_full	Hybrid sampling and random forest machine learning approach for software detect prediction
title_fullStr	Hybrid sampling and random forest machine learning approach for software detect prediction
title_full_unstemmed	Hybrid sampling and random forest machine learning approach for software detect prediction
title_short	Hybrid sampling and random forest machine learning approach for software detect prediction
title_sort	hybrid sampling and random forest machine learning approach for software detect prediction
topic	TK Electrical engineering. Electronics Nuclear engineering
url	http://umpir.ump.edu.my/id/eprint/26687/1/42.%20Hybrid%20sampling%20and%20random%20forest%20machine%20learning.pdf http://umpir.ump.edu.my/id/eprint/26687/2/42.1%20Hybrid%20sampling%20and%20random%20forest%20machine%20learning.pdf http://umpir.ump.edu.my/id/eprint/26687/
url_provider	http://umpir.ump.edu.my/

Hybrid sampling and random forest machine learning approach for software detect prediction

Similar Items