Experimental analysis in Hadoop MapReduce: A closer look at fault detection and recovery techniques
Hadoop MapReduce reactively detects and recovers faults after they occur based on the static heartbeat detection and the re-execution from scratch techniques. However, these techniques lead to excessive response time penalties and inefficient resource consumption during detection and recovery. Exist...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Published: |
MDPI
2021
|
Subjects: | |
Online Access: | http://eprints.um.edu.my/33921/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.um.eprints.33921 |
---|---|
record_format |
eprints |
spelling |
my.um.eprints.339212022-07-12T04:45:50Z http://eprints.um.edu.my/33921/ Experimental analysis in Hadoop MapReduce: A closer look at fault detection and recovery techniques Saadoon, Muntadher Hamid, Siti Hafizah Ab Sofian, Hazrina Altarturi, Hamza Nasuha, Nur Azizul, Zati Hakim Sani, Asmiza Abdul Asemi, Adeleh QD Chemistry TA Engineering (General). Civil engineering (General) Hadoop MapReduce reactively detects and recovers faults after they occur based on the static heartbeat detection and the re-execution from scratch techniques. However, these techniques lead to excessive response time penalties and inefficient resource consumption during detection and recovery. Existing fault-tolerance solutions intend to mitigate the limitations without considering critical conditions such as fail-slow faults, the impact of faults at various infrastructure levels and the relationship between the detection and recovery stages. This paper analyses the response time under two main conditions: fail-stop and fail-slow, when they manifest with node, service, and the task at runtime. In addition, we focus on the relationship between the time for detecting and recovering faults. The experimental analysis is conducted on a real Hadoop cluster comprising MapReduce, YARN and HDFS frameworks. Our analysis shows that the recovery of a single fault leads to an average of 67.6% response time penalty. Even though the detection and recovery times are well-turned, data locality and resource availability must also be considered to obtain the optimum tolerance time and the lowest penalties. MDPI 2021-06 Article PeerReviewed Saadoon, Muntadher and Hamid, Siti Hafizah Ab and Sofian, Hazrina and Altarturi, Hamza and Nasuha, Nur and Azizul, Zati Hakim and Sani, Asmiza Abdul and Asemi, Adeleh (2021) Experimental analysis in Hadoop MapReduce: A closer look at fault detection and recovery techniques. Sensors, 21 (11). ISSN 1424-8220, DOI https://doi.org/10.3390/s21113799 <https://doi.org/10.3390/s21113799>. 10.3390/s21113799 |
institution |
Universiti Malaya |
building |
UM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaya |
content_source |
UM Research Repository |
url_provider |
http://eprints.um.edu.my/ |
topic |
QD Chemistry TA Engineering (General). Civil engineering (General) |
spellingShingle |
QD Chemistry TA Engineering (General). Civil engineering (General) Saadoon, Muntadher Hamid, Siti Hafizah Ab Sofian, Hazrina Altarturi, Hamza Nasuha, Nur Azizul, Zati Hakim Sani, Asmiza Abdul Asemi, Adeleh Experimental analysis in Hadoop MapReduce: A closer look at fault detection and recovery techniques |
description |
Hadoop MapReduce reactively detects and recovers faults after they occur based on the static heartbeat detection and the re-execution from scratch techniques. However, these techniques lead to excessive response time penalties and inefficient resource consumption during detection and recovery. Existing fault-tolerance solutions intend to mitigate the limitations without considering critical conditions such as fail-slow faults, the impact of faults at various infrastructure levels and the relationship between the detection and recovery stages. This paper analyses the response time under two main conditions: fail-stop and fail-slow, when they manifest with node, service, and the task at runtime. In addition, we focus on the relationship between the time for detecting and recovering faults. The experimental analysis is conducted on a real Hadoop cluster comprising MapReduce, YARN and HDFS frameworks. Our analysis shows that the recovery of a single fault leads to an average of 67.6% response time penalty. Even though the detection and recovery times are well-turned, data locality and resource availability must also be considered to obtain the optimum tolerance time and the lowest penalties. |
format |
Article |
author |
Saadoon, Muntadher Hamid, Siti Hafizah Ab Sofian, Hazrina Altarturi, Hamza Nasuha, Nur Azizul, Zati Hakim Sani, Asmiza Abdul Asemi, Adeleh |
author_facet |
Saadoon, Muntadher Hamid, Siti Hafizah Ab Sofian, Hazrina Altarturi, Hamza Nasuha, Nur Azizul, Zati Hakim Sani, Asmiza Abdul Asemi, Adeleh |
author_sort |
Saadoon, Muntadher |
title |
Experimental analysis in Hadoop MapReduce: A closer look at fault detection and recovery techniques |
title_short |
Experimental analysis in Hadoop MapReduce: A closer look at fault detection and recovery techniques |
title_full |
Experimental analysis in Hadoop MapReduce: A closer look at fault detection and recovery techniques |
title_fullStr |
Experimental analysis in Hadoop MapReduce: A closer look at fault detection and recovery techniques |
title_full_unstemmed |
Experimental analysis in Hadoop MapReduce: A closer look at fault detection and recovery techniques |
title_sort |
experimental analysis in hadoop mapreduce: a closer look at fault detection and recovery techniques |
publisher |
MDPI |
publishDate |
2021 |
url |
http://eprints.um.edu.my/33921/ |
_version_ |
1738510691535421440 |
score |
13.211869 |