實物特徵: Effectiveness of RSS feed item duplication detection using word matching

Effectiveness of RSS feed item duplication detection using word matching

Users of feed aggregators know that duplicated articles are found occasionally on the feeds they subscribe to. It can be time consuming to read all articles and stumble upon duplicated items they have already read. Our work here is to determine the effectiveness of using basic word matching to remov...

全面介紹

Saved in:

書目詳細資料
Main Authors:	Tan, Ian K. T., Su, Tze-Wei, Khor, Hao-Ming, Ong, Ee-Chun
格式:	Article
語言:	English
出版:	Sunway University 2011
主題:	QA76 Computer software
在線閱讀:	http://eprints.sunway.edu.my/387/1/SAJ_8_2011_38-53.pdf http://eprints.sunway.edu.my/387/
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!

實物特徵
總結:	Users of feed aggregators know that duplicated articles are found occasionally on the feeds they subscribe to. It can be time consuming to read all articles and stumble upon duplicated items they have already read. Our work here is to determine the effectiveness of using basic word matching to remove duplicated items and only show the most relevant item, thus saving readers’ time. The method described in this paper to remove duplicates involves word matching heuristics with an appropriate matching percentage. The duplicated feeds are then ranked to only display the highest ranked article. Ranking is done using the number of search items found on the titles of the news feeds where the highest number returned will be considered the highest ranked article. Using Malaysian online news feeds, our method found that with a matching percentage of 40%, our method will be able to minimize duplicates effectively with minimal errors. We did further empirical studies using 9 technology blog feeds over a longer period to provide us with a better averaging results. The matching percentage obtained is also within the same quantum. The method described here has a low overhead in terms of processing for the duplicates and with careful selection of matching percentage, the system will effectively remove the majority of duplicates

Effectiveness of RSS feed item duplication detection using word matching

相似書籍