Replica Creation Algorithm for Data Grids

Data grid system is a data management infrastructure that facilitates reliable access and sharing of large amount of data, storage resources, and data transfer services that can be scaled across distributed locations. This thesis presents a new replication algorithm that improves data access perform...

Full description

Saved in:
Bibliographic Details
Main Author: Madi, Mohammed Kamel
Format: Thesis
Language:English
English
Published: 2012
Subjects:
Online Access:https://etd.uum.edu.my/3352/1/MOHAMMED_KAMEL_MADI.pdf
https://etd.uum.edu.my/3352/3/MOHAMMED_KAMEL_MADI.pdf
https://etd.uum.edu.my/3352/
http://sierra.uum.edu.my/record=b1239722~S1
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Data grid system is a data management infrastructure that facilitates reliable access and sharing of large amount of data, storage resources, and data transfer services that can be scaled across distributed locations. This thesis presents a new replication algorithm that improves data access performance in data grids by distributing relevant data copies around the grid. The new Data Replica Creation Algorithm (DRCM) improves performance of data grid systems by reducing job execution time and making the best use of data grid resources (network bandwidth and storage space). Current algorithms focus on number of accesses in deciding which file to replicate and where to place them, which ignores resources’ capabilities. DRCM differs by considering both user and resource perspectives; strategically placing replicas at locations that provide the lowest transfer cost. The proposed algorithm uses three strategies: Replica Creation and Deletion Strategy (RCDS), Replica Placement Strategy (RPS), and Replica Replacement Strategy (RRS). DRCM was evaluated using network simulation (OptorSim) based on selected performance metrics (mean job execution time, efficient network usage, average storage usage, and computing element usage), scenarios, and topologies. Results revealed better job execution time with lower resource consumption than existing approaches. This research contributes replication strategies embodied in one algorithm that enhances data grid performance, capable of making a decision on creating or deleting more than one file during same decision. Furthermore, dependency-level-between-files criterion was utilized and integrated with the exponential growth/decay model to give an accurate file evaluation.