Fuzzy Soft Set Clustering for Categorical Data
Categorical data clustering is difficult because categorical data lacks natural order and can comprise groups of data only related to specific dimensions. Conventional clustering, such as k-means, cannot be openly used to categorical data. Numerous categorical data using clustering algorithms, for...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Society of Visual Informatics, and Institute of Visual Informatics - UKM and Soft Computing and Data Mining Centre - UTHM
2024
|
Subjects: | |
Online Access: | http://ir.unimas.my/id/eprint/47246/1/2364-6612-1-PB.pdf http://ir.unimas.my/id/eprint/47246/ https://joiv.org/index.php/joiv/article/view/2364 http://dx.doi.org/10.62527/joiv.8.1.2364 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Categorical data clustering is difficult because categorical data lacks natural order and can comprise groups of data only
related to specific dimensions. Conventional clustering, such as k-means, cannot be openly used to categorical data. Numerous
categorical data using clustering algorithms, for instance, fuzzy k-modes and their enhancements, have been developed to overcome
this issue. However, these approaches continue to create clusters with low Purity and weak intra-similarity. Furthermore, transforming
category attributes to binary values might be computationally costly. This research provides categorical data with fuzzy clustering
technique due to soft set theory and multinomial distribution. The experiment showed that the approach proposed signifies better
performance in purity, rank index, and response times by up to 97.53%. There are many algorithms that can be used to solve the
challenge of grouping fuzzy-based categorical data. However, these techniques do not always result in improved cluster purity or faster
reaction times. As a solution, it is suggested to use hard categorical data clustering through multinomial distribution. This involves
producing a multi-soft set by using a rotated based soft set, and then clustering the data using a multivariate multinomial distribution.
The comparison of this innovative technique with the established baseline algorithms demonstrates that the suggested approach excels
in terms of purity, rank index, and response times, achieving improvements of up to ninety-seven-point fifty three percent compared to
existing methods. |
---|