A COMPARISON STUDY OF DATA CLUSTERING AND VISUALISATION TECHNIQUES WITH VARIOUS DATA TYPES
Clustering is used to identify the intrinsic grouping of a set of unlabelled data. It can be applied in data mining exploration and statistical data analysis. The clustering technique plays an important role in the current digital environment. As the quality and complication of data on the internet...
Saved in:
Main Author: | |
---|---|
Format: | Final Year Project Report |
Language: | English English |
Published: |
Universiti Malaysia Sarawak (UNIMAS)
2020
|
Subjects: | |
Online Access: | http://ir.unimas.my/id/eprint/32941/1/Ling%20Chien%20-%2024%20pgs.pdf http://ir.unimas.my/id/eprint/32941/4/Ling%20Chien%20ft.pdf http://ir.unimas.my/id/eprint/32941/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Clustering is used to identify the intrinsic grouping of a set of unlabelled data. It can be applied in data mining exploration and statistical data analysis. The clustering technique plays an important role in the current digital environment. As the quality and complication of data on the internet are increasing in today’s rapidly evolving area, the clustering methods become the indispensable techniques to find the patterns of the data. There are many types of clustering techniques that have been developed included partitioning methods, hierarchical clustering, density-based clustering, model-based clustering, and fuzzy clustering. This study only focuses on three types of clustering techniques which are k-means clustering, agglomerative hierarchical clustering with the ward’s linkage, complete linkage, and average linkage, and Self-Organizing Map (SOM). The clustering algorithms are written using Python
language by modifying the coding obtained from the Internet. In this project, experiments on
visualisation and performance analysis of selected clustering methods are conducted. Besides that, a case study is conducted by implementing the clustering technique on online product reviews. The results for the experiment on visualisation of clustering methods, it showed that various clustering techniques have their visualisation for cluster analysis. Meanwhile, the
results of the predictive accuracy indicated that k-means clustering and self-organizing map (SOM) are the most suitable techniques for cluster analysis. Based on the results of the case study, it concluded that the accuracy in clustering the online product reviews has the
relationship with the structures and amount of the sentences. The extractive text summarisation with the clustering technique can be improved and further developed to imply in the customer review system as the correction between them have been known. |
---|