Data pre-processing of website browsing record: An initial step for web page classification

The Internet utilization has resulted in an increase in the number of web pages on the World Wide Web. The classification of web pages is required to organize the growing number of web pages. A web page classification system is proposed to be constructed using a deep learning algorithm. The initial...

Full description

Saved in:
Bibliographic Details
Main Authors: Siti Hawa, Apandi, Jamaludin, Sallim, Rozlina, Mohamed
Format: Conference or Workshop Item
Language:English
English
Published: IEEE 2021
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/33291/1/Data%20pre-processing%20of%20website%20browsing%20record_FULL.pdf
http://umpir.ump.edu.my/id/eprint/33291/13/Data_pre-processing_of_website_browsing_record_An_initial_step_for_web_page.pdf
http://umpir.ump.edu.my/id/eprint/33291/
https://doi.org/10.1109/ICSECS52883.2021.00129
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.ump.umpir.33291
record_format eprints
spelling my.ump.umpir.332912024-01-16T06:13:45Z http://umpir.ump.edu.my/id/eprint/33291/ Data pre-processing of website browsing record: An initial step for web page classification Siti Hawa, Apandi Jamaludin, Sallim Rozlina, Mohamed Q Science (General) QA76 Computer software The Internet utilization has resulted in an increase in the number of web pages on the World Wide Web. The classification of web pages is required to organize the growing number of web pages. A web page classification system is proposed to be constructed using a deep learning algorithm. The initial step for web page classification is data pre-processing. The website browsing record is used as a dataset in this study. The raw dataset needs to be pre-processing to fetch the cleaned data by removing missing value data, redundant data, and error data. There are many steps in data pre-processing which include data cleaning and web content pre-processing. The main contribution of this paper is to investigate how to do data pre-processing on website browsing records that focusing on the Game and Online Video web pages that will be utilized as the dataset to construct the web page classification model. After doing the data pre-processing, the number of datasets will be reduced. This shows many datasets have been removed because it is inactive and not suitable to be used in this study as the dataset of Game and Online Video web pages. IEEE 2021 Conference or Workshop Item PeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/33291/1/Data%20pre-processing%20of%20website%20browsing%20record_FULL.pdf pdf en http://umpir.ump.edu.my/id/eprint/33291/13/Data_pre-processing_of_website_browsing_record_An_initial_step_for_web_page.pdf Siti Hawa, Apandi and Jamaludin, Sallim and Rozlina, Mohamed (2021) Data pre-processing of website browsing record: An initial step for web page classification. In: 7th International Conference on Software Engineering and Computer Systems and 4th International Conference on Computational Science and Information Management, ICSECS-ICOCSIM 2021 , 24-26 Aug. 2021 , Pekan, Malaysia. pp. 679-684.. ISSN 978-166541407-4 ISBN 978-1-6654-1408-1 https://doi.org/10.1109/ICSECS52883.2021.00129
institution Universiti Malaysia Pahang Al-Sultan Abdullah
building UMPSA Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Pahang Al-Sultan Abdullah
content_source UMPSA Institutional Repository
url_provider http://umpir.ump.edu.my/
language English
English
topic Q Science (General)
QA76 Computer software
spellingShingle Q Science (General)
QA76 Computer software
Siti Hawa, Apandi
Jamaludin, Sallim
Rozlina, Mohamed
Data pre-processing of website browsing record: An initial step for web page classification
description The Internet utilization has resulted in an increase in the number of web pages on the World Wide Web. The classification of web pages is required to organize the growing number of web pages. A web page classification system is proposed to be constructed using a deep learning algorithm. The initial step for web page classification is data pre-processing. The website browsing record is used as a dataset in this study. The raw dataset needs to be pre-processing to fetch the cleaned data by removing missing value data, redundant data, and error data. There are many steps in data pre-processing which include data cleaning and web content pre-processing. The main contribution of this paper is to investigate how to do data pre-processing on website browsing records that focusing on the Game and Online Video web pages that will be utilized as the dataset to construct the web page classification model. After doing the data pre-processing, the number of datasets will be reduced. This shows many datasets have been removed because it is inactive and not suitable to be used in this study as the dataset of Game and Online Video web pages.
format Conference or Workshop Item
author Siti Hawa, Apandi
Jamaludin, Sallim
Rozlina, Mohamed
author_facet Siti Hawa, Apandi
Jamaludin, Sallim
Rozlina, Mohamed
author_sort Siti Hawa, Apandi
title Data pre-processing of website browsing record: An initial step for web page classification
title_short Data pre-processing of website browsing record: An initial step for web page classification
title_full Data pre-processing of website browsing record: An initial step for web page classification
title_fullStr Data pre-processing of website browsing record: An initial step for web page classification
title_full_unstemmed Data pre-processing of website browsing record: An initial step for web page classification
title_sort data pre-processing of website browsing record: an initial step for web page classification
publisher IEEE
publishDate 2021
url http://umpir.ump.edu.my/id/eprint/33291/1/Data%20pre-processing%20of%20website%20browsing%20record_FULL.pdf
http://umpir.ump.edu.my/id/eprint/33291/13/Data_pre-processing_of_website_browsing_record_An_initial_step_for_web_page.pdf
http://umpir.ump.edu.my/id/eprint/33291/
https://doi.org/10.1109/ICSECS52883.2021.00129
_version_ 1822924053505114112
score 13.23243