Automatic Topic-Based Web Page Classification Using Deep Learning
The internet is frequently surfed by people by using smartphones, laptops, or computers in order to search information online in the web. The increase of information in the web has made the web pages grow day by day. The automatic topic-based web page classification is used to manage the excessive a...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Politeknik Negeri Padang
2023
|
Subjects: | |
Online Access: | http://umpir.ump.edu.my/id/eprint/40036/1/Automatic%20Topic-Based%20Web%20Page%20Classification%20Using%20Deep%20Learning.pdf http://umpir.ump.edu.my/id/eprint/40036/ https://dx.doi.org/10.30630/joiv.7.3-2.1616 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.ump.umpir.40036 |
---|---|
record_format |
eprints |
spelling |
my.ump.umpir.400362024-01-16T07:06:17Z http://umpir.ump.edu.my/id/eprint/40036/ Automatic Topic-Based Web Page Classification Using Deep Learning Siti Hawa, Apandi Jamaludin, Sallim Rozlina, Mohamed Norkhairi, Ahmad QA75 Electronic computers. Computer science QA76 Computer software The internet is frequently surfed by people by using smartphones, laptops, or computers in order to search information online in the web. The increase of information in the web has made the web pages grow day by day. The automatic topic-based web page classification is used to manage the excessive amount of web pages by classifying them to different categories based on the web page content. Different machine learning algorithms have been employed as web page classifiers to categorise the web pages. However, there is lack of study that review classification of web pages using deep learning. In this study, the automatic topic-based classification of web pages utilising deep learning that has been proposed by many key researchers are reviewed. The relevant research papers are selected from reputable research databases. The review process looked at the dataset, features, algorithm, pre-processing used in classification of web pages, document representation technique and performance of the web page classification model. The document representation technique used to represent the web page features is an important aspect in the classification of web pages as it affects the performance of the web page classification model. The integral web page feature is the textual content. Based on the review, it was found that the image based web page classification showed higher performance compared to the text based web page classification. Due to lack of matrix representation that can effectively handle long web page text content, a new document representation technique which is word cloud image can be used to visualize the words that have been extracted from the text content web page. Politeknik Negeri Padang 2023-11 Article PeerReviewed pdf en cc_by_nc_sa_4 http://umpir.ump.edu.my/id/eprint/40036/1/Automatic%20Topic-Based%20Web%20Page%20Classification%20Using%20Deep%20Learning.pdf Siti Hawa, Apandi and Jamaludin, Sallim and Rozlina, Mohamed and Norkhairi, Ahmad (2023) Automatic Topic-Based Web Page Classification Using Deep Learning. International Journal on Informatics Visualization, 7 (3-2). pp. 2108-2114. ISSN 2549-9904. (Published) https://dx.doi.org/10.30630/joiv.7.3-2.1616 10.30630/joiv.7.3-2.1616 |
institution |
Universiti Malaysia Pahang Al-Sultan Abdullah |
building |
UMPSA Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaysia Pahang Al-Sultan Abdullah |
content_source |
UMPSA Institutional Repository |
url_provider |
http://umpir.ump.edu.my/ |
language |
English |
topic |
QA75 Electronic computers. Computer science QA76 Computer software |
spellingShingle |
QA75 Electronic computers. Computer science QA76 Computer software Siti Hawa, Apandi Jamaludin, Sallim Rozlina, Mohamed Norkhairi, Ahmad Automatic Topic-Based Web Page Classification Using Deep Learning |
description |
The internet is frequently surfed by people by using smartphones, laptops, or computers in order to search information online in the web. The increase of information in the web has made the web pages grow day by day. The automatic topic-based web page classification is used to manage the excessive amount of web pages by classifying them to different categories based on the web page content. Different machine learning algorithms have been employed as web page classifiers to categorise the web pages. However, there is lack of study that review classification of web pages using deep learning. In this study, the automatic topic-based classification of web pages utilising deep learning that has been proposed by many key researchers are reviewed. The relevant research papers are selected from reputable research databases. The review process looked at the dataset, features, algorithm, pre-processing used in classification of web pages, document representation technique and performance of the web page classification model. The document representation technique used to represent the web page features is an important aspect in the classification of web pages as it affects the performance of the web page classification model. The integral web page feature is the textual content. Based on the review, it was found that the image based web page classification showed higher performance compared to the text based web page classification. Due to lack of matrix representation that can effectively handle long web page text content, a new document representation technique which is word cloud image can be used to visualize the words that have been extracted from the text content web page. |
format |
Article |
author |
Siti Hawa, Apandi Jamaludin, Sallim Rozlina, Mohamed Norkhairi, Ahmad |
author_facet |
Siti Hawa, Apandi Jamaludin, Sallim Rozlina, Mohamed Norkhairi, Ahmad |
author_sort |
Siti Hawa, Apandi |
title |
Automatic Topic-Based Web Page Classification Using Deep Learning |
title_short |
Automatic Topic-Based Web Page Classification Using Deep Learning |
title_full |
Automatic Topic-Based Web Page Classification Using Deep Learning |
title_fullStr |
Automatic Topic-Based Web Page Classification Using Deep Learning |
title_full_unstemmed |
Automatic Topic-Based Web Page Classification Using Deep Learning |
title_sort |
automatic topic-based web page classification using deep learning |
publisher |
Politeknik Negeri Padang |
publishDate |
2023 |
url |
http://umpir.ump.edu.my/id/eprint/40036/1/Automatic%20Topic-Based%20Web%20Page%20Classification%20Using%20Deep%20Learning.pdf http://umpir.ump.edu.my/id/eprint/40036/ https://dx.doi.org/10.30630/joiv.7.3-2.1616 |
_version_ |
1822924085119680512 |
score |
13.232424 |