A modified π rough k-means algorithm for web page recommendation system

Web page recommendation system is an application of Web Usage Mining (WUM) approach, which specializes in predicting the user next browsing activity in real-time Web for personalized recommendations. To date, many works have been addressed in investigating the use of data mining techniques (e.g.,...

Full description

Saved in:
Bibliographic Details
Main Author: Zidane, Khaled Ali Othman
Format: Thesis
Language:English
Published: 2018
Online Access:http://psasir.upm.edu.my/id/eprint/68816/1/FSKTM%202018%2023%20-%20IR.pdf
http://psasir.upm.edu.my/id/eprint/68816/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Web page recommendation system is an application of Web Usage Mining (WUM) approach, which specializes in predicting the user next browsing activity in real-time Web for personalized recommendations. To date, many works have been addressed in investigating the use of data mining techniques (e.g., Clustering) in Web page application. Most of the research efforts are utilized partitional clustering algorithms to discover user profiles in order to obtain a better quality of recommendations. However, the quality of current solutions has only managed to achieve accuracy within the range of 60-70%. This happens due to the weaknesses of partitive algorithm criterion in Web page recommendation system to overcome the overlapping profiles, which require more attention. In order to tackle above problem, a modified algorithm for Web page recommendation is proposed. The ultimate goal is to improve the recommendation quality which leads to increase the prediction accuracy. Hence, this study carried out several objectives to augment the support of modified clustering algorithm. Firstly, an extended K-Means clustering algorithm (called X-Means algorithm) is proposed to filter/remove the noise from user session data to eliminate outliers or irrelevant pages. Secondly, a modified πRKM algorithm is proposed to partition the user session data. The modified πRKM is able to perform better partition by identifying the overlapping objects between the correct clusters and also capable to do a re-partition using the indiscernibility relation function. Thirdly, the local and global similarity algorithm is proposed to classify the current user pages request to produce recommendations. There are different datasets used to carry out extensive experiments which are described as follows; firstly, Iris and Vowel datasets were used to assess the effectiveness of proposed modified πRKM, where rough classifier assessment strategies used to measure the quality of overlapping classes. The experimental results revealed that the modified πRKM algorithm performed better than the previous version in terms of the correct identification of overlapping objects between positive clusters. Secondly, the CTI dataset, which has been proven by the existing research work as a more suitable Web server logs in the term of Web page recommendation quality, is used for measuring the performance of the proposed modified algorithm for Web page recommendation system. The experiment is divided into three interdependent stages of usage mining process, namely: data preparation, pattern discovery, and recommendation. In data preparation stage, the quality of prepared data is measured by Local Outlier Factor (LOF) method. The experimental results revealed that the degree of user sessions outliers reduced than the previous method while in pattern discovery stage, the results of user sessions partition with the modified πRKM algorithm are measured by the Davies Bouldin Index (DBI). The experimental results revealed that the modified πRKM algorithm significantly affected the partitions quality of the cluster obtained. In the third stage, the results of recommendation engine are measured using three accuracy parameters, namely Precision, Coverage, and Fmeasure. The results of the proposed modified algorithm for Web page recommendation system achieve an accuracy of 76-82% which is significantly outperforming than the previous work.