Staff View: A partition based feature selection approach for mixed data clustering / Ashish Dutt

A partition based feature selection approach for mixed data clustering / Ashish Dutt

Presently, educational institutions compile and store huge volumes of data, such as student enrolment and attendance records, as well as their examination results. Mining such data yields stimulating information that serves its handlers well. Rapid growth in educational data points to the fact that...

Full description

Saved in:

Bibliographic Details
Main Author:	Ashish , Dutt
Format:	Thesis
Published:	2020
Subjects:	QA76 Computer software
Online Access:	http://studentsrepo.um.edu.my/14481/2/Ashish_Dutt.pdf http://studentsrepo.um.edu.my/14481/1/Ashish_Dutt.pdf http://studentsrepo.um.edu.my/14481/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.um.stud.14481
record_format	eprints
spelling	my.um.stud.144812023-06-07T17:29:07Z A partition based feature selection approach for mixed data clustering / Ashish Dutt Ashish , Dutt QA76 Computer software Presently, educational institutions compile and store huge volumes of data, such as student enrolment and attendance records, as well as their examination results. Mining such data yields stimulating information that serves its handlers well. Rapid growth in educational data points to the fact that distilling massive amounts of data requires a more sophisticated set of algorithms. This issue led to the emergence of the field of Educational Data Mining (EDM). Traditional data mining algorithms cannot be directly applied to educational problems, as they may have a specific objective and function. This implies that a pre-processing algorithm has to be enforced first and only then some specific data mining methods can be applied to the problems. One such pre-processing algorithm in EDM is clustering. It is a widely used method in data mining to discover unique patterns in underlying data. It finds patterns by analysing the features in data. A feature contains a measured value. A value can be of an atomic type like categorical (text only) or numerical (number only). A categorical data type can be ordinal (ordered) or nominal (unordered). In either case, the feature is of univariate data type. Often in real-world environment, data consist of both categorical and numerical valued features. Such datasets are called mixed data. In literature, several clustering methods exist for analysing numerical or categorical data. There are a few clustering algorithms for handling mixed data. Clustering mixed data is dependent on the dissimilarities of its constituent features. This dependability on data types may influence a clustering solution. Assigning appropriate weights to the feature, such that it diminishes the data type influence may improve the performance of a partition clustering algorithm. In this thesis, a novel weighted feature selection approach on nominal features is proposed, for a partition. clustering algorithm that can handle mixed data. The proposed approach exploits the pre-processing nature of the partition clustering algorithm in the selection of weight assignment for nominal features. The benefits of weighting are demonstrated on both simulated and real-world mixed datasets. The experimental results yield better results for weighted nominal features in mixed data clustering. 2020-10 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/14481/2/Ashish_Dutt.pdf application/pdf http://studentsrepo.um.edu.my/14481/1/Ashish_Dutt.pdf Ashish , Dutt (2020) A partition based feature selection approach for mixed data clustering / Ashish Dutt. PhD thesis, Universiti Malaya. http://studentsrepo.um.edu.my/14481/
institution	Universiti Malaya
building	UM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Malaya
content_source	UM Student Repository
url_provider	http://studentsrepo.um.edu.my/
topic	QA76 Computer software
spellingShingle	QA76 Computer software Ashish , Dutt A partition based feature selection approach for mixed data clustering / Ashish Dutt
description	Presently, educational institutions compile and store huge volumes of data, such as student enrolment and attendance records, as well as their examination results. Mining such data yields stimulating information that serves its handlers well. Rapid growth in educational data points to the fact that distilling massive amounts of data requires a more sophisticated set of algorithms. This issue led to the emergence of the field of Educational Data Mining (EDM). Traditional data mining algorithms cannot be directly applied to educational problems, as they may have a specific objective and function. This implies that a pre-processing algorithm has to be enforced first and only then some specific data mining methods can be applied to the problems. One such pre-processing algorithm in EDM is clustering. It is a widely used method in data mining to discover unique patterns in underlying data. It finds patterns by analysing the features in data. A feature contains a measured value. A value can be of an atomic type like categorical (text only) or numerical (number only). A categorical data type can be ordinal (ordered) or nominal (unordered). In either case, the feature is of univariate data type. Often in real-world environment, data consist of both categorical and numerical valued features. Such datasets are called mixed data. In literature, several clustering methods exist for analysing numerical or categorical data. There are a few clustering algorithms for handling mixed data. Clustering mixed data is dependent on the dissimilarities of its constituent features. This dependability on data types may influence a clustering solution. Assigning appropriate weights to the feature, such that it diminishes the data type influence may improve the performance of a partition clustering algorithm. In this thesis, a novel weighted feature selection approach on nominal features is proposed, for a partition. clustering algorithm that can handle mixed data. The proposed approach exploits the pre-processing nature of the partition clustering algorithm in the selection of weight assignment for nominal features. The benefits of weighting are demonstrated on both simulated and real-world mixed datasets. The experimental results yield better results for weighted nominal features in mixed data clustering.
format	Thesis
author	Ashish , Dutt
author_facet	Ashish , Dutt
author_sort	Ashish , Dutt
title	A partition based feature selection approach for mixed data clustering / Ashish Dutt
title_short	A partition based feature selection approach for mixed data clustering / Ashish Dutt
title_full	A partition based feature selection approach for mixed data clustering / Ashish Dutt
title_fullStr	A partition based feature selection approach for mixed data clustering / Ashish Dutt
title_full_unstemmed	A partition based feature selection approach for mixed data clustering / Ashish Dutt
title_sort	partition based feature selection approach for mixed data clustering / ashish dutt
publishDate	2020
url	http://studentsrepo.um.edu.my/14481/2/Ashish_Dutt.pdf http://studentsrepo.um.edu.my/14481/1/Ashish_Dutt.pdf http://studentsrepo.um.edu.my/14481/
_version_	1769842915156164608
score	13.211869

A partition based feature selection approach for mixed data clustering / Ashish Dutt

Similar Items