MARC表示: Text Extraction Algorithm for Web Text Classification

Text Extraction Algorithm for Web Text Classification

Explosive expand of web pages in the World Wide Web makes it difficult for search engine and web directory to give relevant results to the user requirements. Web pages need automatic classification techniques with high classification accuracy. This study provides a text extraction algorithm for web...

詳細記述

保存先:

書誌詳細
第一著者:	Theab, Mustafa Muwafak
フォーマット:	学位論文
言語:	English
出版事項:	2010
主題:	QA71-90 Instruments and machines
オンライン･アクセス:	http://etd.uum.edu.my/2164/1/Mustafa_Muwafak_Theab.pdf http://etd.uum.edu.my/2164/ http://lintas.uum.edu.my:8080/elmu/index.jsp?module=webopac-l&action=fullDisplayRetriever.jsp&szMaterialNo=0000757917
タグ:	タグ追加タグなし, このレコードへの初めてのタグを付けませんか!

id	my.uum.etd.2164
record_format	eprints
spelling	my.uum.etd.21642013-07-24T12:14:42Z http://etd.uum.edu.my/2164/ Text Extraction Algorithm for Web Text Classification Theab, Mustafa Muwafak QA71-90 Instruments and machines Explosive expand of web pages in the World Wide Web makes it difficult for search engine and web directory to give relevant results to the user requirements. Web pages need automatic classification techniques with high classification accuracy. This study provides a text extraction algorithm for web text classification. The extraction algorithm consists of three phases namely web page extraction, rule formulation, and algorithm validation. A text extraction prototype is built using Visual C# 2008 to validate the algorithm. It is a windows application mixed with web connection protocol. The prototype offers the creation of Binary data set as well as term frequency inverse document frequency (tf-idf) data set. In this study, the experiment was conducted on five English educational websites. The created data sets are then classified using Naive-Bayes and C4.5 algorithms provided in WEKA application. The experimental results show that Naive-Bayes classifier with web text extraction algorithm proves to be the best method for web text classification. 2010 Thesis NonPeerReviewed application/pdf en http://etd.uum.edu.my/2164/1/Mustafa_Muwafak_Theab.pdf Theab, Mustafa Muwafak (2010) Text Extraction Algorithm for Web Text Classification. Masters thesis, Universiti Utara Malaysia. http://lintas.uum.edu.my:8080/elmu/index.jsp?module=webopac-l&action=fullDisplayRetriever.jsp&szMaterialNo=0000757917
institution	Universiti Utara Malaysia
building	UUM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Utara Malaysia
content_source	UUM Electronic Theses
url_provider	http://etd.uum.edu.my/
language	English
topic	QA71-90 Instruments and machines
spellingShingle	QA71-90 Instruments and machines Theab, Mustafa Muwafak Text Extraction Algorithm for Web Text Classification
description	Explosive expand of web pages in the World Wide Web makes it difficult for search engine and web directory to give relevant results to the user requirements. Web pages need automatic classification techniques with high classification accuracy. This study provides a text extraction algorithm for web text classification. The extraction algorithm consists of three phases namely web page extraction, rule formulation, and algorithm validation. A text extraction prototype is built using Visual C# 2008 to validate the algorithm. It is a windows application mixed with web connection protocol. The prototype offers the creation of Binary data set as well as term frequency inverse document frequency (tf-idf) data set. In this study, the experiment was conducted on five English educational websites. The created data sets are then classified using Naive-Bayes and C4.5 algorithms provided in WEKA application. The experimental results show that Naive-Bayes classifier with web text extraction algorithm proves to be the best method for web text classification.
format	Thesis
author	Theab, Mustafa Muwafak
author_facet	Theab, Mustafa Muwafak
author_sort	Theab, Mustafa Muwafak
title	Text Extraction Algorithm for Web Text Classification
title_short	Text Extraction Algorithm for Web Text Classification
title_full	Text Extraction Algorithm for Web Text Classification
title_fullStr	Text Extraction Algorithm for Web Text Classification
title_full_unstemmed	Text Extraction Algorithm for Web Text Classification
title_sort	text extraction algorithm for web text classification
publishDate	2010
url	http://etd.uum.edu.my/2164/1/Mustafa_Muwafak_Theab.pdf http://etd.uum.edu.my/2164/ http://lintas.uum.edu.my:8080/elmu/index.jsp?module=webopac-l&action=fullDisplayRetriever.jsp&szMaterialNo=0000757917
_version_	1644276611358392320
score	13.250246

Text Extraction Algorithm for Web Text Classification

類似資料