Data transmission performance analysis in cloud and grid

Hadoop Distributed File System (HDFS) and MapReduce programming model are for storage and retrieval of the big data. The Terabytes size file can be easily stored on the HDFS and can be analyzed with MapReduce. HDFS is becoming more popular in recent years as a key building block of integrated grid s...

Full description

Saved in:
Bibliographic Details
Main Authors: Abdulkarem, Mohammed, Latip, Rohaya
Format: Article
Published: Asian Research Publishing Network 2015
Online Access:http://psasir.upm.edu.my/id/eprint/44237/
https://www.arpnjournals.com/jeas/volume_18_2015.htm
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Hadoop Distributed File System (HDFS) and MapReduce programming model are for storage and retrieval of the big data. The Terabytes size file can be easily stored on the HDFS and can be analyzed with MapReduce. HDFS is becoming more popular in recent years as a key building block of integrated grid storage solution in the field of scientific computing. However, due to the nature of HDFS that it cannot support asynchronous write, it is widely confirmed that for the case of sustained high throughput in WAN transfer, single stream per GridFTP transfer is the best solution. GridFTP, designed by using Globus, is one of the most popular protocols for performing data transfers in the Grid environment. In this paper, we take on the challenge of integrating Hadoop with grid, by proposing a new framework called Grid-over-Hadoop by retaining the features of Hadoop and using GridFTP for data transfer.