A clickstream-based web page significance ranking metric for web crawlers
The unpredictable fast growing dimension of the World Wide Web and its non-static nature causes considerable obstacles for Web crawlers including the presence of some incorrect and irrelevant answers among search results set and the scaling issues. Hence, solutions that are more promising are in dem...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference or Workshop Item |
Published: |
2011
|
Online Access: | http://eprints.utm.my/id/eprint/45469/ http://dx.doi.org/10.1109/MySEC.2011.6140674 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utm.45469 |
---|---|
record_format |
eprints |
spelling |
my.utm.454692017-08-29T00:59:19Z http://eprints.utm.my/id/eprint/45469/ A clickstream-based web page significance ranking metric for web crawlers Selamat, Ali Ahmadi-Abkenari, Fatemeh The unpredictable fast growing dimension of the World Wide Web and its non-static nature causes considerable obstacles for Web crawlers including the presence of some incorrect and irrelevant answers among search results set and the scaling issues. Hence, solutions that are more promising are in demand to provide more accurate search outcomes. Because implementing existed Web page importance metrics either link based or context based within a parallel crawler can not be an absolute solution for the coverage of authorized fresh Web content and the accuracy concerns, so employing these metrics is not the final approach within search engines' architecture. This paper proposes an analysis on clickstream data in order to discover the popularity of Web pages in crawl frontier through proposing the metric itself and presenting the experimental results on ranking the UTM Web pages based on the proposed discussed metric. 2011 Conference or Workshop Item PeerReviewed Selamat, Ali and Ahmadi-Abkenari, Fatemeh (2011) A clickstream-based web page significance ranking metric for web crawlers. In: The 5th Malaysian Software Engineering Conference (Mysec 2011). http://dx.doi.org/10.1109/MySEC.2011.6140674 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
description |
The unpredictable fast growing dimension of the World Wide Web and its non-static nature causes considerable obstacles for Web crawlers including the presence of some incorrect and irrelevant answers among search results set and the scaling issues. Hence, solutions that are more promising are in demand to provide more accurate search outcomes. Because implementing existed Web page importance metrics either link based or context based within a parallel crawler can not be an absolute solution for the coverage of authorized fresh Web content and the accuracy concerns, so employing these metrics is not the final approach within search engines' architecture. This paper proposes an analysis on clickstream data in order to discover the popularity of Web pages in crawl frontier through proposing the metric itself and presenting the experimental results on ranking the UTM Web pages based on the proposed discussed metric. |
format |
Conference or Workshop Item |
author |
Selamat, Ali Ahmadi-Abkenari, Fatemeh |
spellingShingle |
Selamat, Ali Ahmadi-Abkenari, Fatemeh A clickstream-based web page significance ranking metric for web crawlers |
author_facet |
Selamat, Ali Ahmadi-Abkenari, Fatemeh |
author_sort |
Selamat, Ali |
title |
A clickstream-based web page significance ranking metric for web crawlers |
title_short |
A clickstream-based web page significance ranking metric for web crawlers |
title_full |
A clickstream-based web page significance ranking metric for web crawlers |
title_fullStr |
A clickstream-based web page significance ranking metric for web crawlers |
title_full_unstemmed |
A clickstream-based web page significance ranking metric for web crawlers |
title_sort |
clickstream-based web page significance ranking metric for web crawlers |
publishDate |
2011 |
url |
http://eprints.utm.my/id/eprint/45469/ http://dx.doi.org/10.1109/MySEC.2011.6140674 |
_version_ |
1643651749078106112 |
score |
13.211869 |