Application of Optimization Methods for Solving Clustering and Classification Problems

Cluster and classification analysis are very interesting data mining topics that can be applied in many fields. Clustering includes the identification of subsets of the data that are similar. Intuitively, samples within a valid cluster are more similar to each other than they are to a sample belongi...

Full description

Saved in:
Bibliographic Details
Main Author: Shabanzadeh, Parvaneh
Format: Thesis
Language:English
Published: 2011
Online Access:http://psasir.upm.edu.my/id/eprint/19691/1/IPM_2011_3.pdf
http://psasir.upm.edu.my/id/eprint/19691/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.upm.eprints.19691
record_format eprints
spelling my.upm.eprints.196912014-05-15T03:32:52Z http://psasir.upm.edu.my/id/eprint/19691/ Application of Optimization Methods for Solving Clustering and Classification Problems Shabanzadeh, Parvaneh Cluster and classification analysis are very interesting data mining topics that can be applied in many fields. Clustering includes the identification of subsets of the data that are similar. Intuitively, samples within a valid cluster are more similar to each other than they are to a sample belonging to a different cluster. Samples in the same cluster have the same label. The aim of data classification is to set up rules for the classification of some observations that the classes of data are supposed to be known. Here, there is a collection of classes with labels and the problem is to label a new observation or data point belonging to one or more classes of data. The focus of this thesis is on solvingclustering and classification problems. Specifically, we will focus on new optimization methods for solving clustering and classification problems. First we briefly give some data analysis background. Then a review of different methods currently available that can be used to solve clustering and classification problems is also given. Clustering problem is discussed as a problem of non-smooth, non-convex optimization and a new method for solving this optimization problem is developed. This optimization problem has a number of characteristics that make it challenging: it has many local minimum, the optimization variables can be either continuous or categorical, and there are no exact analytical derivatives. In this study we show how to apply a particular class of optimization methods known as pattern search methods to address these challenges. This method does not explicitly use derivatives, and is particularly appropriate when functions are non-smooth. Also a new algorithm for finding the initial point is proposed. We have established that our proposed method can produce excellent results compared to those previously known methods. Results of computational experiments on real data sets present the robustness and advantage of the new method. Next the problem of data classification is studied as a problem of global, non-smooth and non-convex optimization; this approach consists of describing clusters for the given training sets. The data vectors are assigned to the closest cluster and correspondingly to the set, which contains this cluster and an algorithm based on a derivative-free method is applied to the solution of this problem. The proposed method has been tested on real-world datasets. Results of numerical experiments have been presented which demonstrate the effectiveness of the proposed algorithm. 2011-03 Thesis NonPeerReviewed application/pdf en http://psasir.upm.edu.my/id/eprint/19691/1/IPM_2011_3.pdf Shabanzadeh, Parvaneh (2011) Application of Optimization Methods for Solving Clustering and Classification Problems. PhD thesis, Universiti Putra Malaysia.
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
language English
description Cluster and classification analysis are very interesting data mining topics that can be applied in many fields. Clustering includes the identification of subsets of the data that are similar. Intuitively, samples within a valid cluster are more similar to each other than they are to a sample belonging to a different cluster. Samples in the same cluster have the same label. The aim of data classification is to set up rules for the classification of some observations that the classes of data are supposed to be known. Here, there is a collection of classes with labels and the problem is to label a new observation or data point belonging to one or more classes of data. The focus of this thesis is on solvingclustering and classification problems. Specifically, we will focus on new optimization methods for solving clustering and classification problems. First we briefly give some data analysis background. Then a review of different methods currently available that can be used to solve clustering and classification problems is also given. Clustering problem is discussed as a problem of non-smooth, non-convex optimization and a new method for solving this optimization problem is developed. This optimization problem has a number of characteristics that make it challenging: it has many local minimum, the optimization variables can be either continuous or categorical, and there are no exact analytical derivatives. In this study we show how to apply a particular class of optimization methods known as pattern search methods to address these challenges. This method does not explicitly use derivatives, and is particularly appropriate when functions are non-smooth. Also a new algorithm for finding the initial point is proposed. We have established that our proposed method can produce excellent results compared to those previously known methods. Results of computational experiments on real data sets present the robustness and advantage of the new method. Next the problem of data classification is studied as a problem of global, non-smooth and non-convex optimization; this approach consists of describing clusters for the given training sets. The data vectors are assigned to the closest cluster and correspondingly to the set, which contains this cluster and an algorithm based on a derivative-free method is applied to the solution of this problem. The proposed method has been tested on real-world datasets. Results of numerical experiments have been presented which demonstrate the effectiveness of the proposed algorithm.
format Thesis
author Shabanzadeh, Parvaneh
spellingShingle Shabanzadeh, Parvaneh
Application of Optimization Methods for Solving Clustering and Classification Problems
author_facet Shabanzadeh, Parvaneh
author_sort Shabanzadeh, Parvaneh
title Application of Optimization Methods for Solving Clustering and Classification Problems
title_short Application of Optimization Methods for Solving Clustering and Classification Problems
title_full Application of Optimization Methods for Solving Clustering and Classification Problems
title_fullStr Application of Optimization Methods for Solving Clustering and Classification Problems
title_full_unstemmed Application of Optimization Methods for Solving Clustering and Classification Problems
title_sort application of optimization methods for solving clustering and classification problems
publishDate 2011
url http://psasir.upm.edu.my/id/eprint/19691/1/IPM_2011_3.pdf
http://psasir.upm.edu.my/id/eprint/19691/
_version_ 1643827114550493184
score 13.211869