Semi-supervised learning for feature selection and classification of data / Ganesh Krishnasamy

Feature selection and classification are widely utilized for data analysis. Recently, considerable advancement has been achieved in semi-supervised multi-task feature selection algorithms, where they have exploited the shared information from multiple related tasks. However, these semi-supervised...

Full description

Saved in:
Bibliographic Details
Main Author: Ganesh , Krishnasamy
Format: Thesis
Published: 2019
Subjects:
Online Access:http://studentsrepo.um.edu.my/10006/1/Ganesh_Krishmasamy.pdf
http://studentsrepo.um.edu.my/10006/2/Ganesh_Krishnasamy_%2D_Thesis.pdf
http://studentsrepo.um.edu.my/10006/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.um.stud.10006
record_format eprints
institution Universiti Malaya
building UM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaya
content_source UM Student Repository
url_provider http://studentsrepo.um.edu.my/
topic TK Electrical engineering. Electronics Nuclear engineering
spellingShingle TK Electrical engineering. Electronics Nuclear engineering
Ganesh , Krishnasamy
Semi-supervised learning for feature selection and classification of data / Ganesh Krishnasamy
description Feature selection and classification are widely utilized for data analysis. Recently, considerable advancement has been achieved in semi-supervised multi-task feature selection algorithms, where they have exploited the shared information from multiple related tasks. However, these semi-supervised multi-task selection feature algorithms are unable to naturally handle the multi-view data since they are designed to deal with single-view data. Existing studies have demonstrated that mining information enclosed in multiple views can drastically enhance the performance of feature selection. As for classification, researchers have used semi-supervised learning for extreme learning machine (ELM), where they have exploited both the labeled and unlabeled data in order to boost the learning performances. They have incorporated Laplacian regularization to determine the geometry of the underlying manifold. However, Laplacian regularization lacks extrapolating power and biases the solution towards a constant function. These drawbacks affect the performances of Laplacian regularized semi-supervised ELMs when a few labeled data is used. In the first part of the study, a novel mathematical framework is introduced for multi-view Laplacian semi-supervised feature selection by mining the correlations among multiple tasks. The proposed algorithm is capable of exploiting complementary information from different feature views in each task while exploring the shared knowledge between multiple related tasks in a joint framework when the labeled training data is sparse. An efficient iterative algorithm is developed to optimize the objective function of the proposed algorithm since it is non-smooth and difficult to solve. The proposed algorithm is compared with the state-of-the-art feature selection algorithms using three different datasets. These datasets include consumer video dataset, 3D motion recognition dataset and handwritten digits recognition dataset. In these experiments, all the training and testing data are represented as feature vectors. By using the proposed algorithm, the sparse coefficients are learned by exploiting the relationships among different multi-view features and leveraging the knowledge from multiple related tasks. Then, the sparse coefficients are applied to both the feature vectors of the training and testing data to select the most representative features. The selected features are then fed into a linear support vector machine (SVM) for classification. The experimental results show that the proposed feature selection framework performed better when compared to other state-of-the-art feature selection algorithms. In the second part of the study, a novel classification algorithm called Hessian semi-supervised ELM (HSS-ELM) is proposed to enhance the semi-supervised learning of ELM. Unlike the Laplacian regularization, the Hessian regularization favours function whose values vary linearly along the geodesic distance and preserves the local manifold structure well. It leads to good extrapolating power. Furthermore, HSS-ELM maintains almost all the advantages of the traditional ELM such as the significant training efficiency and straightforward implementation for multiclass classification problems. The proposed algorithm is tested on publicly available datasets. These datasets include G50C, COIL20 (B), COIL20, USPST(B) and USPST. The experimental results demonstrate that the proposed algorithm is competitive compared to the state-of-the-art semi-supervised learning algorithms in terms of accuracy. Additionally, HSS-ELM requires remarkably less training time compared to semi-supervised SVMs/regularized least-squares algorithms.
format Thesis
author Ganesh , Krishnasamy
author_facet Ganesh , Krishnasamy
author_sort Ganesh , Krishnasamy
title Semi-supervised learning for feature selection and classification of data / Ganesh Krishnasamy
title_short Semi-supervised learning for feature selection and classification of data / Ganesh Krishnasamy
title_full Semi-supervised learning for feature selection and classification of data / Ganesh Krishnasamy
title_fullStr Semi-supervised learning for feature selection and classification of data / Ganesh Krishnasamy
title_full_unstemmed Semi-supervised learning for feature selection and classification of data / Ganesh Krishnasamy
title_sort semi-supervised learning for feature selection and classification of data / ganesh krishnasamy
publishDate 2019
url http://studentsrepo.um.edu.my/10006/1/Ganesh_Krishmasamy.pdf
http://studentsrepo.um.edu.my/10006/2/Ganesh_Krishnasamy_%2D_Thesis.pdf
http://studentsrepo.um.edu.my/10006/
_version_ 1738506311741472768
spelling my.um.stud.100062020-02-05T17:21:51Z Semi-supervised learning for feature selection and classification of data / Ganesh Krishnasamy Ganesh , Krishnasamy TK Electrical engineering. Electronics Nuclear engineering Feature selection and classification are widely utilized for data analysis. Recently, considerable advancement has been achieved in semi-supervised multi-task feature selection algorithms, where they have exploited the shared information from multiple related tasks. However, these semi-supervised multi-task selection feature algorithms are unable to naturally handle the multi-view data since they are designed to deal with single-view data. Existing studies have demonstrated that mining information enclosed in multiple views can drastically enhance the performance of feature selection. As for classification, researchers have used semi-supervised learning for extreme learning machine (ELM), where they have exploited both the labeled and unlabeled data in order to boost the learning performances. They have incorporated Laplacian regularization to determine the geometry of the underlying manifold. However, Laplacian regularization lacks extrapolating power and biases the solution towards a constant function. These drawbacks affect the performances of Laplacian regularized semi-supervised ELMs when a few labeled data is used. In the first part of the study, a novel mathematical framework is introduced for multi-view Laplacian semi-supervised feature selection by mining the correlations among multiple tasks. The proposed algorithm is capable of exploiting complementary information from different feature views in each task while exploring the shared knowledge between multiple related tasks in a joint framework when the labeled training data is sparse. An efficient iterative algorithm is developed to optimize the objective function of the proposed algorithm since it is non-smooth and difficult to solve. The proposed algorithm is compared with the state-of-the-art feature selection algorithms using three different datasets. These datasets include consumer video dataset, 3D motion recognition dataset and handwritten digits recognition dataset. In these experiments, all the training and testing data are represented as feature vectors. By using the proposed algorithm, the sparse coefficients are learned by exploiting the relationships among different multi-view features and leveraging the knowledge from multiple related tasks. Then, the sparse coefficients are applied to both the feature vectors of the training and testing data to select the most representative features. The selected features are then fed into a linear support vector machine (SVM) for classification. The experimental results show that the proposed feature selection framework performed better when compared to other state-of-the-art feature selection algorithms. In the second part of the study, a novel classification algorithm called Hessian semi-supervised ELM (HSS-ELM) is proposed to enhance the semi-supervised learning of ELM. Unlike the Laplacian regularization, the Hessian regularization favours function whose values vary linearly along the geodesic distance and preserves the local manifold structure well. It leads to good extrapolating power. Furthermore, HSS-ELM maintains almost all the advantages of the traditional ELM such as the significant training efficiency and straightforward implementation for multiclass classification problems. The proposed algorithm is tested on publicly available datasets. These datasets include G50C, COIL20 (B), COIL20, USPST(B) and USPST. The experimental results demonstrate that the proposed algorithm is competitive compared to the state-of-the-art semi-supervised learning algorithms in terms of accuracy. Additionally, HSS-ELM requires remarkably less training time compared to semi-supervised SVMs/regularized least-squares algorithms. 2019 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/10006/1/Ganesh_Krishmasamy.pdf application/pdf http://studentsrepo.um.edu.my/10006/2/Ganesh_Krishnasamy_%2D_Thesis.pdf Ganesh , Krishnasamy (2019) Semi-supervised learning for feature selection and classification of data / Ganesh Krishnasamy. PhD thesis, University of Malaya. http://studentsrepo.um.edu.my/10006/
score 13.211869