An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing

Fault tolerance in grid computing allows the system to continue operate despite occurrence of failure. Most fault tolerance algorithms focus on fault handling techniques such as task reprocessing, checkpointing, task replication, penalty, and task migration. Ant colony system (ACS), a variant of an...

Full description

Saved in:
Bibliographic Details
Main Author: Saufi, Bukhari
Format: Thesis
Language:English
English
English
Published: 2020
Subjects:
Online Access:https://etd.uum.edu.my/8715/1/Deposit%20Permission_s900382.pdf
https://etd.uum.edu.my/8715/2/s900382_01.pdf
https://etd.uum.edu.my/8715/3/s900382_references.docx
https://etd.uum.edu.my/8715/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uum.etd.8715
record_format eprints
spelling my.uum.etd.87152021-10-07T05:51:34Z https://etd.uum.edu.my/8715/ An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing Saufi, Bukhari QA75 Electronic computers. Computer science Fault tolerance in grid computing allows the system to continue operate despite occurrence of failure. Most fault tolerance algorithms focus on fault handling techniques such as task reprocessing, checkpointing, task replication, penalty, and task migration. Ant colony system (ACS), a variant of ant colony optimization (ACO), is one of the promising algorithms for fault tolerance due to its ability to adapt to both static and dynamic combinatorial optimization problems. However, ACS algorithm does not consider the resource fitness during task scheduling which leads to poor load balancing and lower execution success rate. This research proposes dynamic ACS fault tolerance with suspension (DAFTS) in grid computing that focuses on providing effective fault tolerance techniques to improve the execution success rate and load balancing. The proposed algorithm consists of dynamic evaporation rate, resource fitness-based scheduling process, enhanced pheromone update with trust factor and suspension, and checkpoint-based task reprocessing. The research framework consists of four phases which are identifying fault tolerance techniques, enhancing resource assignment and job scheduling, improving fault tolerance algorithm and, evaluating the performance of the proposed algorithm. The proposed algorithm was developed in a simulated grid environment called GridSim and evaluated against other fault tolerance algorithms such as trust-based ACO, fault tolerance ACO, ACO without fault tolerance and ACO with fault tolerance in terms of total execution time, average latency, average makespan, throughput, execution success rate and load balancing. Experimental results showed that the proposed algorithm achieved the best performance in most aspects, and second best in terms of load balancing. The DAFTS achieved the smallest increase on execution time, average makespan and average latency by 7%, 11% and 5% respectively, and smallest decrease on throughput and execution success rate by 6.49% and 9% respectively as the failure rate increases. The DAFTS also achieved the smallest increment on execution time, average makespan and average latency by 5.8, 8.5 and 8.7 times respectively, and highest increase on throughput and highest execution success rate by 72.9% and 93.7% respectively as the number of jobs increases. The proposed algorithm can effectively overcome load balancing problems and increase execution success rates in distributed systems that are prone to faults. 2020 Thesis NonPeerReviewed text en https://etd.uum.edu.my/8715/1/Deposit%20Permission_s900382.pdf text en https://etd.uum.edu.my/8715/2/s900382_01.pdf text en https://etd.uum.edu.my/8715/3/s900382_references.docx Saufi, Bukhari (2020) An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing. Doctoral thesis, Universiti Utara Malaysia.
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Electronic Theses
url_provider http://etd.uum.edu.my/
language English
English
English
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Saufi, Bukhari
An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
description Fault tolerance in grid computing allows the system to continue operate despite occurrence of failure. Most fault tolerance algorithms focus on fault handling techniques such as task reprocessing, checkpointing, task replication, penalty, and task migration. Ant colony system (ACS), a variant of ant colony optimization (ACO), is one of the promising algorithms for fault tolerance due to its ability to adapt to both static and dynamic combinatorial optimization problems. However, ACS algorithm does not consider the resource fitness during task scheduling which leads to poor load balancing and lower execution success rate. This research proposes dynamic ACS fault tolerance with suspension (DAFTS) in grid computing that focuses on providing effective fault tolerance techniques to improve the execution success rate and load balancing. The proposed algorithm consists of dynamic evaporation rate, resource fitness-based scheduling process, enhanced pheromone update with trust factor and suspension, and checkpoint-based task reprocessing. The research framework consists of four phases which are identifying fault tolerance techniques, enhancing resource assignment and job scheduling, improving fault tolerance algorithm and, evaluating the performance of the proposed algorithm. The proposed algorithm was developed in a simulated grid environment called GridSim and evaluated against other fault tolerance algorithms such as trust-based ACO, fault tolerance ACO, ACO without fault tolerance and ACO with fault tolerance in terms of total execution time, average latency, average makespan, throughput, execution success rate and load balancing. Experimental results showed that the proposed algorithm achieved the best performance in most aspects, and second best in terms of load balancing. The DAFTS achieved the smallest increase on execution time, average makespan and average latency by 7%, 11% and 5% respectively, and smallest decrease on throughput and execution success rate by 6.49% and 9% respectively as the failure rate increases. The DAFTS also achieved the smallest increment on execution time, average makespan and average latency by 5.8, 8.5 and 8.7 times respectively, and highest increase on throughput and highest execution success rate by 72.9% and 93.7% respectively as the number of jobs increases. The proposed algorithm can effectively overcome load balancing problems and increase execution success rates in distributed systems that are prone to faults.
format Thesis
author Saufi, Bukhari
author_facet Saufi, Bukhari
author_sort Saufi, Bukhari
title An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
title_short An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
title_full An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
title_fullStr An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
title_full_unstemmed An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
title_sort enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
publishDate 2020
url https://etd.uum.edu.my/8715/1/Deposit%20Permission_s900382.pdf
https://etd.uum.edu.my/8715/2/s900382_01.pdf
https://etd.uum.edu.my/8715/3/s900382_references.docx
https://etd.uum.edu.my/8715/
_version_ 1713202244271210496
score 13.211869