Enhanced mechanism to handle missing data of Hadith classifier
Tree structured modeling is a data mining technique used to recursively partition a data set into relatively homogeneous subgroups in order to make more accurate predictions on the future instances. Decision tree algorithms have the ability to deal with missing values or wrong data. While this abili...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2011
|
Subjects: | |
Online Access: | http://irep.iium.edu.my/11307/2/Enhanced_mechanism_to_handle_missing_data_of_Hadith_classifier_11307.pdf http://irep.iium.edu.my/11307/ http://www.ontariointernational.org/ConferenceMalaysia2011.htm |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Tree structured modeling is a data mining technique used to recursively partition a data set into relatively homogeneous subgroups in order to make more accurate predictions on the future instances. Decision tree algorithms have the ability to deal with missing values or wrong data. While this ability is considered to be advantage, the extreme effort which is required to achieve it is considered a drawback. The correct branch to take is unknown if a feature tested is missing, and the algorithm must employed enhanced mechanisms to handle missing values. Moreover, ignoring these missing data may cause critical decision to user or administrators. Specially for the cases that belong to religion. Hadith classifier is a method to classify such Hadith into four major classes Sahih, Hasan, Da'ef and Maudo' according to the status of its Isnad ( narrators chain ). This research produced a mechanism to deal with missing data in Hadith database, 999 Hadiths from Sahih Al-Bukhari, Jami'u Al-Termithi and Selseelt AlaHadith Aldae'ifah w' Almadu'h were framed the sample of this study, the attributes of the hadith database were gained according to the validate methods of Hadith science, the experiment applied C4.5 algorithm to extract the rules of classification. Moreover, the experiment has two phases training and testing , in the first phase, the machine learnt from training dataset, meanwhile, the detector detected the missing data and replace any missing data with the correct attribute according to the validity method. In the second phase the machine detect any missing data to replace it with correct attribute and dealt with passive narrator chain. The findings showed that the accurate rate of the classifier has been improved by the proposed approach with 1.65% ,on the other hand, the time complexity had effected with 0.05 seconds. Meanwhile, with naïve bayes algorithm, the accurate rate has been improved by 0.6%. In contrast to C4.5 algorithm, the time complexity to build classifier remained as it is 0.02 seconds. Furthermore, the accurate rate of the classifier positively affected with the size of training dataset in both cases. |
---|