Experimental analysis in Hadoop MapReduce: A closer look at fault detection and recovery techniques

Hadoop MapReduce reactively detects and recovers faults after they occur based on the static heartbeat detection and the re-execution from scratch techniques. However, these techniques lead to excessive response time penalties and inefficient resource consumption during detection and recovery. Exist...

Full description

Saved in:
Bibliographic Details
Main Authors: Saadoon, Muntadher, Hamid, Siti Hafizah Ab, Sofian, Hazrina, Altarturi, Hamza, Nasuha, Nur, Azizul, Zati Hakim, Sani, Asmiza Abdul, Asemi, Adeleh
Format: Article
Published: MDPI 2021
Subjects:
Online Access:http://eprints.um.edu.my/33921/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.um.eprints.33921
record_format eprints
spelling my.um.eprints.339212022-07-12T04:45:50Z http://eprints.um.edu.my/33921/ Experimental analysis in Hadoop MapReduce: A closer look at fault detection and recovery techniques Saadoon, Muntadher Hamid, Siti Hafizah Ab Sofian, Hazrina Altarturi, Hamza Nasuha, Nur Azizul, Zati Hakim Sani, Asmiza Abdul Asemi, Adeleh QD Chemistry TA Engineering (General). Civil engineering (General) Hadoop MapReduce reactively detects and recovers faults after they occur based on the static heartbeat detection and the re-execution from scratch techniques. However, these techniques lead to excessive response time penalties and inefficient resource consumption during detection and recovery. Existing fault-tolerance solutions intend to mitigate the limitations without considering critical conditions such as fail-slow faults, the impact of faults at various infrastructure levels and the relationship between the detection and recovery stages. This paper analyses the response time under two main conditions: fail-stop and fail-slow, when they manifest with node, service, and the task at runtime. In addition, we focus on the relationship between the time for detecting and recovering faults. The experimental analysis is conducted on a real Hadoop cluster comprising MapReduce, YARN and HDFS frameworks. Our analysis shows that the recovery of a single fault leads to an average of 67.6% response time penalty. Even though the detection and recovery times are well-turned, data locality and resource availability must also be considered to obtain the optimum tolerance time and the lowest penalties. MDPI 2021-06 Article PeerReviewed Saadoon, Muntadher and Hamid, Siti Hafizah Ab and Sofian, Hazrina and Altarturi, Hamza and Nasuha, Nur and Azizul, Zati Hakim and Sani, Asmiza Abdul and Asemi, Adeleh (2021) Experimental analysis in Hadoop MapReduce: A closer look at fault detection and recovery techniques. Sensors, 21 (11). ISSN 1424-8220, DOI https://doi.org/10.3390/s21113799 <https://doi.org/10.3390/s21113799>. 10.3390/s21113799
institution Universiti Malaya
building UM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaya
content_source UM Research Repository
url_provider http://eprints.um.edu.my/
topic QD Chemistry
TA Engineering (General). Civil engineering (General)
spellingShingle QD Chemistry
TA Engineering (General). Civil engineering (General)
Saadoon, Muntadher
Hamid, Siti Hafizah Ab
Sofian, Hazrina
Altarturi, Hamza
Nasuha, Nur
Azizul, Zati Hakim
Sani, Asmiza Abdul
Asemi, Adeleh
Experimental analysis in Hadoop MapReduce: A closer look at fault detection and recovery techniques
description Hadoop MapReduce reactively detects and recovers faults after they occur based on the static heartbeat detection and the re-execution from scratch techniques. However, these techniques lead to excessive response time penalties and inefficient resource consumption during detection and recovery. Existing fault-tolerance solutions intend to mitigate the limitations without considering critical conditions such as fail-slow faults, the impact of faults at various infrastructure levels and the relationship between the detection and recovery stages. This paper analyses the response time under two main conditions: fail-stop and fail-slow, when they manifest with node, service, and the task at runtime. In addition, we focus on the relationship between the time for detecting and recovering faults. The experimental analysis is conducted on a real Hadoop cluster comprising MapReduce, YARN and HDFS frameworks. Our analysis shows that the recovery of a single fault leads to an average of 67.6% response time penalty. Even though the detection and recovery times are well-turned, data locality and resource availability must also be considered to obtain the optimum tolerance time and the lowest penalties.
format Article
author Saadoon, Muntadher
Hamid, Siti Hafizah Ab
Sofian, Hazrina
Altarturi, Hamza
Nasuha, Nur
Azizul, Zati Hakim
Sani, Asmiza Abdul
Asemi, Adeleh
author_facet Saadoon, Muntadher
Hamid, Siti Hafizah Ab
Sofian, Hazrina
Altarturi, Hamza
Nasuha, Nur
Azizul, Zati Hakim
Sani, Asmiza Abdul
Asemi, Adeleh
author_sort Saadoon, Muntadher
title Experimental analysis in Hadoop MapReduce: A closer look at fault detection and recovery techniques
title_short Experimental analysis in Hadoop MapReduce: A closer look at fault detection and recovery techniques
title_full Experimental analysis in Hadoop MapReduce: A closer look at fault detection and recovery techniques
title_fullStr Experimental analysis in Hadoop MapReduce: A closer look at fault detection and recovery techniques
title_full_unstemmed Experimental analysis in Hadoop MapReduce: A closer look at fault detection and recovery techniques
title_sort experimental analysis in hadoop mapreduce: a closer look at fault detection and recovery techniques
publisher MDPI
publishDate 2021
url http://eprints.um.edu.my/33921/
_version_ 1738510691535421440
score 13.211869