Indexing strategies of MapReduce for information retrieval in big data
In Information Retrieval (IR) the efficient strategy of indexing large dataset and terabyte-scale data is still an issue because of information overload as the result of increasing the knowledge, increasing the number of different media, increasing the number of platforms, and increasing the i...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2016
|
Online Access: | http://psasir.upm.edu.my/id/eprint/66723/1/FSKTM%202016%2025%20IR.pdf http://psasir.upm.edu.my/id/eprint/66723/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.upm.eprints.66723 |
---|---|
record_format |
eprints |
spelling |
my.upm.eprints.667232019-01-31T02:28:06Z http://psasir.upm.edu.my/id/eprint/66723/ Indexing strategies of MapReduce for information retrieval in big data Ramadhan, Mazen Farid Ebrahim In Information Retrieval (IR) the efficient strategy of indexing large dataset and terabyte-scale data is still an issue because of information overload as the result of increasing the knowledge, increasing the number of different media, increasing the number of platforms, and increasing the interoperability of platforms. Overall multiple processing machines MapReduce has been suggested as a suitable platform that use for distributing the intensive data operations. In this project, Sensei and Per-posting list indexing, Terrier will be analysed as they are the two most efficient MapReduce indexing strategies. The two indexing will be implemented in an existing framework of IR, and an experiment will be performed by using the Hadoop for MapReducing with the same large dataset, and try to analyse and verify the better efficient strategy between Sensei and Terrier. The experiment will measure the performance of retrieving when the size and processing power enlarge. The experiment examines how the indexing strategies scaled and work with large size of dataset and distributed number of different machines. The throughput will be measured by using MB/S (megabyte/per second), and the experiment results analyzing the performance of delay, consuming time and efficiency of indexing strategies between Sensei and Per-posting list indexing ,Terrier. 2016-01 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/66723/1/FSKTM%202016%2025%20IR.pdf Ramadhan, Mazen Farid Ebrahim (2016) Indexing strategies of MapReduce for information retrieval in big data. Masters thesis, Universiti Putra Malaysia. |
institution |
Universiti Putra Malaysia |
building |
UPM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Putra Malaysia |
content_source |
UPM Institutional Repository |
url_provider |
http://psasir.upm.edu.my/ |
language |
English |
description |
In Information Retrieval (IR) the efficient strategy of indexing large dataset and
terabyte-scale data is still an issue because of information overload as the result of
increasing the knowledge, increasing the number of different media, increasing
the number of platforms, and increasing the interoperability of platforms. Overall
multiple processing machines MapReduce has been suggested as a suitable
platform that use for distributing the intensive data operations. In this project,
Sensei and Per-posting list indexing, Terrier will be analysed as they are the two
most efficient MapReduce indexing strategies. The two indexing will be
implemented in an existing framework of IR, and an experiment will be
performed by using the Hadoop for MapReducing with the same large dataset,
and try to analyse and verify the better efficient strategy between Sensei and
Terrier. The experiment will measure the performance of retrieving when the size and processing power enlarge. The experiment examines how the indexing
strategies scaled and work with large size of dataset and distributed number of
different machines. The throughput will be measured by using MB/S
(megabyte/per second), and the experiment results analyzing the performance of
delay, consuming time and efficiency of indexing strategies between Sensei and
Per-posting list indexing ,Terrier. |
format |
Thesis |
author |
Ramadhan, Mazen Farid Ebrahim |
spellingShingle |
Ramadhan, Mazen Farid Ebrahim Indexing strategies of MapReduce for information retrieval in big data |
author_facet |
Ramadhan, Mazen Farid Ebrahim |
author_sort |
Ramadhan, Mazen Farid Ebrahim |
title |
Indexing strategies of MapReduce for information retrieval in big data |
title_short |
Indexing strategies of MapReduce for information retrieval in big data |
title_full |
Indexing strategies of MapReduce for information retrieval in big data |
title_fullStr |
Indexing strategies of MapReduce for information retrieval in big data |
title_full_unstemmed |
Indexing strategies of MapReduce for information retrieval in big data |
title_sort |
indexing strategies of mapreduce for information retrieval in big data |
publishDate |
2016 |
url |
http://psasir.upm.edu.my/id/eprint/66723/1/FSKTM%202016%2025%20IR.pdf http://psasir.upm.edu.my/id/eprint/66723/ |
_version_ |
1643838691637985280 |
score |
13.211869 |