Staff View: Comparison between imputation method for handling missing data / Ayunie Ezadin, Nur Izzaty Chumin and Siti Nur Izzatulnisa Salit

Comparison between imputation method for handling missing data / Ayunie Ezadin, Nur Izzaty Chumin and Siti Nur Izzatulnisa Salit

This paper presents imputation method for the National Institute of Diabetes and Digestive and Kidney Diseases data from Arizona, United States. Missing data occurs in this data for ﬁve variables which are plasma glucose concentration, diastolic blood pressure, triceps skin fold thickness, serum ins...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ezadin, Ayunie, Chumin, Nur Izzaty, Salit, Siti Nur Izzatulnisa
Format:	Student Project
Language:	English
Published:	2021
Subjects:	Statistical data Study and teaching Data processing Analysis
Online Access:	https://ir.uitm.edu.my/id/eprint/59272/1/59272.pdf https://ir.uitm.edu.my/id/eprint/59272/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.uitm.ir.59272
record_format	eprints
spelling	my.uitm.ir.592722022-05-12T08:44:09Z https://ir.uitm.edu.my/id/eprint/59272/ Comparison between imputation method for handling missing data / Ayunie Ezadin, Nur Izzaty Chumin and Siti Nur Izzatulnisa Salit Ezadin, Ayunie Chumin, Nur Izzaty Salit, Siti Nur Izzatulnisa Statistical data Study and teaching Data processing Analysis This paper presents imputation method for the National Institute of Diabetes and Digestive and Kidney Diseases data from Arizona, United States. Missing data occurs in this data for ﬁve variables which are plasma glucose concentration, diastolic blood pressure, triceps skin fold thickness, serum insulin intake and body mass index (BMI). Missing data leads to problem that can cause bias and invalid conclusions to be made. This research objectives are to improve the data by ﬁlling the missing value and to compare which imputation method is better to handle missing value in a data set. In this research, imputation method and evaluation of the performance are applied for this data using Rstudio software. Five imputation methods used in this paper are Mean imputation method, K-Nearest Neighbour (KNN) imputation method, Multiple imputation method, Hot-Deck imputation method and Regression imputation method. The performance of these methods are evaluated using statistical analysis, coefﬁcient of determination (R2), mean-squared eror (MSE), root of mean square error (RMSE), mean absolute error (MAE), index of agreement (d) and bias (B). Based on the result obtained from this research, it can be concluded that K-Nearest Neighbour imputation method is the best method among the ﬁve methods that are applied to handle the missing value. Conclusions are made as K-Nearest Neighbour (KNN) imputation method shows the best performance and has the lowest error value compared to other methods. 2021 Student Project NonPeerReviewed text en https://ir.uitm.edu.my/id/eprint/59272/1/59272.pdf (2021) Comparison between imputation method for handling missing data / Ayunie Ezadin, Nur Izzaty Chumin and Siti Nur Izzatulnisa Salit. [Student Project] (Unpublished)
institution	Universiti Teknologi Mara
building	Tun Abdul Razak Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Mara
content_source	UiTM Institutional Repository
url_provider	http://ir.uitm.edu.my/
language	English
topic	Statistical data Study and teaching Data processing Analysis
spellingShingle	Statistical data Study and teaching Data processing Analysis Ezadin, Ayunie Chumin, Nur Izzaty Salit, Siti Nur Izzatulnisa Comparison between imputation method for handling missing data / Ayunie Ezadin, Nur Izzaty Chumin and Siti Nur Izzatulnisa Salit
description	This paper presents imputation method for the National Institute of Diabetes and Digestive and Kidney Diseases data from Arizona, United States. Missing data occurs in this data for ﬁve variables which are plasma glucose concentration, diastolic blood pressure, triceps skin fold thickness, serum insulin intake and body mass index (BMI). Missing data leads to problem that can cause bias and invalid conclusions to be made. This research objectives are to improve the data by ﬁlling the missing value and to compare which imputation method is better to handle missing value in a data set. In this research, imputation method and evaluation of the performance are applied for this data using Rstudio software. Five imputation methods used in this paper are Mean imputation method, K-Nearest Neighbour (KNN) imputation method, Multiple imputation method, Hot-Deck imputation method and Regression imputation method. The performance of these methods are evaluated using statistical analysis, coefﬁcient of determination (R2), mean-squared eror (MSE), root of mean square error (RMSE), mean absolute error (MAE), index of agreement (d) and bias (B). Based on the result obtained from this research, it can be concluded that K-Nearest Neighbour imputation method is the best method among the ﬁve methods that are applied to handle the missing value. Conclusions are made as K-Nearest Neighbour (KNN) imputation method shows the best performance and has the lowest error value compared to other methods.
format	Student Project
author	Ezadin, Ayunie Chumin, Nur Izzaty Salit, Siti Nur Izzatulnisa
author_facet	Ezadin, Ayunie Chumin, Nur Izzaty Salit, Siti Nur Izzatulnisa
author_sort	Ezadin, Ayunie
title	Comparison between imputation method for handling missing data / Ayunie Ezadin, Nur Izzaty Chumin and Siti Nur Izzatulnisa Salit
title_short	Comparison between imputation method for handling missing data / Ayunie Ezadin, Nur Izzaty Chumin and Siti Nur Izzatulnisa Salit
title_full	Comparison between imputation method for handling missing data / Ayunie Ezadin, Nur Izzaty Chumin and Siti Nur Izzatulnisa Salit
title_fullStr	Comparison between imputation method for handling missing data / Ayunie Ezadin, Nur Izzaty Chumin and Siti Nur Izzatulnisa Salit
title_full_unstemmed	Comparison between imputation method for handling missing data / Ayunie Ezadin, Nur Izzaty Chumin and Siti Nur Izzatulnisa Salit
title_sort	comparison between imputation method for handling missing data / ayunie ezadin, nur izzaty chumin and siti nur izzatulnisa salit
publishDate	2021
url	https://ir.uitm.edu.my/id/eprint/59272/1/59272.pdf https://ir.uitm.edu.my/id/eprint/59272/
_version_	1732948390233243648
score	13.211869

Comparison between imputation method for handling missing data / Ayunie Ezadin, Nur Izzaty Chumin and Siti Nur Izzatulnisa Salit

Similar Items