Evaluating the Effect of Dataset Size on Predictive Model Using Supervised Learning Technique

Learning models used for prediction purposes are mostly developed without paying much cognizance to the size of datasetsthat can produce models of high accuracy and better generalization. Although, the general believe is that, large dataset is needed to construct a predictive learning model. To des...

Full description

Saved in:
Bibliographic Details
Main Authors: Raheem, Ajiboye Adeleke, Ruzaini, Abdullah Arshah, Hongwu, Qin, Kebbe, H. Isah
Format: Article
Language:en
Published: Penerbit UMP 2015
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/6085/1/EVALUATING%20THE%20EFFECT%20OF%20DATASET%20SIZE%20ON%20PREDICTIVE%20MODEL.pdf
http://umpir.ump.edu.my/id/eprint/6085/
http://ijsecs.ump.edu.my/images/archive/vol1/06Ajiboye_IJSECS.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1831521874421481472
author Raheem, Ajiboye Adeleke
Ruzaini, Abdullah Arshah
Hongwu, Qin
Kebbe, H. Isah
author_facet Raheem, Ajiboye Adeleke
Ruzaini, Abdullah Arshah
Hongwu, Qin
Kebbe, H. Isah
author_sort Raheem, Ajiboye Adeleke
building UMPSA Library
collection Institutional Repository
content_provider Universiti Malaysia Pahang Al-Sultan Abdullah
content_source UMPSA Institutional Repository
continent Asia
country Malaysia
description Learning models used for prediction purposes are mostly developed without paying much cognizance to the size of datasetsthat can produce models of high accuracy and better generalization. Although, the general believe is that, large dataset is needed to construct a predictive learning model. To describe adata setas large in size, perhaps, iscircumstance dependent, thus, what constitutesa dataset to be considered as being big or small is vague.In this paper, the ability of predictive model to generalize with respect to a particular size of data when simulated with new untrained input is examined. The study experiments on three different sizes of data using Matlab programto create predictive models with a view to establishing if the sizeof data has any effect on the accuracy of a model.The simulated output of each model is measured using theMean Absolute Error (MAE) and comparisons are made. Findings from this study reveals that, the quantity of data partitioned for the purpose of training must be of good representation of the entire sets and sufficient enough to span through the input space. The results of simulating the three network models also shows that, the learning model with the largest size of training setsappearsto be the most accurate and consistently delivers a much better and stable results.
format Article
id my.ump.umpir.6085
institution Universiti Malaysia Pahang
language en
publishDate 2015
publisher Penerbit UMP
record_format eprints
spelling my.ump.umpir.60852018-05-18T02:49:20Z http://umpir.ump.edu.my/id/eprint/6085/ Evaluating the Effect of Dataset Size on Predictive Model Using Supervised Learning Technique Raheem, Ajiboye Adeleke Ruzaini, Abdullah Arshah Hongwu, Qin Kebbe, H. Isah QA76 Computer software Learning models used for prediction purposes are mostly developed without paying much cognizance to the size of datasetsthat can produce models of high accuracy and better generalization. Although, the general believe is that, large dataset is needed to construct a predictive learning model. To describe adata setas large in size, perhaps, iscircumstance dependent, thus, what constitutesa dataset to be considered as being big or small is vague.In this paper, the ability of predictive model to generalize with respect to a particular size of data when simulated with new untrained input is examined. The study experiments on three different sizes of data using Matlab programto create predictive models with a view to establishing if the sizeof data has any effect on the accuracy of a model.The simulated output of each model is measured using theMean Absolute Error (MAE) and comparisons are made. Findings from this study reveals that, the quantity of data partitioned for the purpose of training must be of good representation of the entire sets and sufficient enough to span through the input space. The results of simulating the three network models also shows that, the learning model with the largest size of training setsappearsto be the most accurate and consistently delivers a much better and stable results. Penerbit UMP 2015 Article PeerReviewed application/pdf en http://umpir.ump.edu.my/id/eprint/6085/1/EVALUATING%20THE%20EFFECT%20OF%20DATASET%20SIZE%20ON%20PREDICTIVE%20MODEL.pdf Raheem, Ajiboye Adeleke and Ruzaini, Abdullah Arshah and Hongwu, Qin and Kebbe, H. Isah (2015) Evaluating the Effect of Dataset Size on Predictive Model Using Supervised Learning Technique. International Journal of Software Engineering & Computer Sciences (IJSECS), 1. pp. 74-84. ISSN 2289-8522. (Published) http://ijsecs.ump.edu.my/images/archive/vol1/06Ajiboye_IJSECS.pdf
spellingShingle QA76 Computer software
Raheem, Ajiboye Adeleke
Ruzaini, Abdullah Arshah
Hongwu, Qin
Kebbe, H. Isah
Evaluating the Effect of Dataset Size on Predictive Model Using Supervised Learning Technique
title Evaluating the Effect of Dataset Size on Predictive Model Using Supervised Learning Technique
title_full Evaluating the Effect of Dataset Size on Predictive Model Using Supervised Learning Technique
title_fullStr Evaluating the Effect of Dataset Size on Predictive Model Using Supervised Learning Technique
title_full_unstemmed Evaluating the Effect of Dataset Size on Predictive Model Using Supervised Learning Technique
title_short Evaluating the Effect of Dataset Size on Predictive Model Using Supervised Learning Technique
title_sort evaluating the effect of dataset size on predictive model using supervised learning technique
topic QA76 Computer software
url http://umpir.ump.edu.my/id/eprint/6085/1/EVALUATING%20THE%20EFFECT%20OF%20DATASET%20SIZE%20ON%20PREDICTIVE%20MODEL.pdf
http://umpir.ump.edu.my/id/eprint/6085/
http://ijsecs.ump.edu.my/images/archive/vol1/06Ajiboye_IJSECS.pdf
url_provider http://umpir.ump.edu.my/