Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters

K-means is an unsupervised learning and partitioning clustering algorithm. It is popular and widely used for its simplicity and fastness. K-means clustering produce a number of separate flat (non-hierarchical) clusters and suitable for generating globular clusters. The main drawback of the k-means...

Full description

Saved in:
Bibliographic Details
Main Authors: Wan Maseri, Wan Mohd, Beg, Abul Hashem, Herawan, Tutut, Noraziah, Ahmad
Format: Article
Language:en
Published: IGI Global 2011
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/9328/7/improved-parameterless-k-means_-auto-generation-centroids-and-distance-data-point-clusters%281%29.pdf
http://umpir.ump.edu.my/id/eprint/9328/
http://www.igi-global.com/article/improved-parameterless-means/64168
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1831522664243527680
author Wan Maseri, Wan Mohd
Beg, Abul Hashem
Herawan, Tutut
Noraziah, Ahmad
author_facet Wan Maseri, Wan Mohd
Beg, Abul Hashem
Herawan, Tutut
Noraziah, Ahmad
author_sort Wan Maseri, Wan Mohd
building UMPSA Library
collection Institutional Repository
content_provider Universiti Malaysia Pahang Al-Sultan Abdullah
content_source UMPSA Institutional Repository
continent Asia
country Malaysia
description K-means is an unsupervised learning and partitioning clustering algorithm. It is popular and widely used for its simplicity and fastness. K-means clustering produce a number of separate flat (non-hierarchical) clusters and suitable for generating globular clusters. The main drawback of the k-means algorithm is that the user must specify the number of clusters in advance. This paper presents an improved version of K-means algorithm with auto-generate an initial number of clusters (k) and a new approach of defining initial Centroid for effective and efficient clustering process. The underlined mechanism has been analyzed and experimented. The experimental results show that the number of iteration is reduced to 50% and the run time is lower and constantly based on maximum distance of data points, regardless of how many data points.
format Article
id my.ump.umpir.9328
institution Universiti Malaysia Pahang
language en
publishDate 2011
publisher IGI Global
record_format eprints
spelling my.ump.umpir.93282018-02-05T00:25:43Z http://umpir.ump.edu.my/id/eprint/9328/ Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters Wan Maseri, Wan Mohd Beg, Abul Hashem Herawan, Tutut Noraziah, Ahmad QA75 Electronic computers. Computer science K-means is an unsupervised learning and partitioning clustering algorithm. It is popular and widely used for its simplicity and fastness. K-means clustering produce a number of separate flat (non-hierarchical) clusters and suitable for generating globular clusters. The main drawback of the k-means algorithm is that the user must specify the number of clusters in advance. This paper presents an improved version of K-means algorithm with auto-generate an initial number of clusters (k) and a new approach of defining initial Centroid for effective and efficient clustering process. The underlined mechanism has been analyzed and experimented. The experimental results show that the number of iteration is reduced to 50% and the run time is lower and constantly based on maximum distance of data points, regardless of how many data points. IGI Global 2011-07 Article PeerReviewed application/pdf en http://umpir.ump.edu.my/id/eprint/9328/7/improved-parameterless-k-means_-auto-generation-centroids-and-distance-data-point-clusters%281%29.pdf Wan Maseri, Wan Mohd and Beg, Abul Hashem and Herawan, Tutut and Noraziah, Ahmad (2011) Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters. International Journal of Information Retrieval Research (IJIRR), 1 (3). pp. 1-14. ISSN 2155-6377 (print); 2155-6385 (online). (Published) http://www.igi-global.com/article/improved-parameterless-means/64168 10.4018/ijirr.2011070101
spellingShingle QA75 Electronic computers. Computer science
Wan Maseri, Wan Mohd
Beg, Abul Hashem
Herawan, Tutut
Noraziah, Ahmad
Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters
title Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters
title_full Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters
title_fullStr Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters
title_full_unstemmed Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters
title_short Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters
title_sort improved parameterless k-means: auto-generation centroids and distance data point clusters
topic QA75 Electronic computers. Computer science
url http://umpir.ump.edu.my/id/eprint/9328/7/improved-parameterless-k-means_-auto-generation-centroids-and-distance-data-point-clusters%281%29.pdf
http://umpir.ump.edu.my/id/eprint/9328/
http://www.igi-global.com/article/improved-parameterless-means/64168
url_provider http://umpir.ump.edu.my/