An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
Fault tolerance in grid computing allows the system to continue operate despite occurrence of failure. Most fault tolerance algorithms focus on fault handling techniques such as task reprocessing, checkpointing, task replication, penalty, and task migration. Ant colony system (ACS), a variant of an...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English English English |
Published: |
2020
|
Subjects: | |
Online Access: | https://etd.uum.edu.my/8715/1/Deposit%20Permission_s900382.pdf https://etd.uum.edu.my/8715/2/s900382_01.pdf https://etd.uum.edu.my/8715/3/s900382_references.docx https://etd.uum.edu.my/8715/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.uum.etd.8715 |
---|---|
record_format |
eprints |
spelling |
my.uum.etd.87152021-10-07T05:51:34Z https://etd.uum.edu.my/8715/ An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing Saufi, Bukhari QA75 Electronic computers. Computer science Fault tolerance in grid computing allows the system to continue operate despite occurrence of failure. Most fault tolerance algorithms focus on fault handling techniques such as task reprocessing, checkpointing, task replication, penalty, and task migration. Ant colony system (ACS), a variant of ant colony optimization (ACO), is one of the promising algorithms for fault tolerance due to its ability to adapt to both static and dynamic combinatorial optimization problems. However, ACS algorithm does not consider the resource fitness during task scheduling which leads to poor load balancing and lower execution success rate. This research proposes dynamic ACS fault tolerance with suspension (DAFTS) in grid computing that focuses on providing effective fault tolerance techniques to improve the execution success rate and load balancing. The proposed algorithm consists of dynamic evaporation rate, resource fitness-based scheduling process, enhanced pheromone update with trust factor and suspension, and checkpoint-based task reprocessing. The research framework consists of four phases which are identifying fault tolerance techniques, enhancing resource assignment and job scheduling, improving fault tolerance algorithm and, evaluating the performance of the proposed algorithm. The proposed algorithm was developed in a simulated grid environment called GridSim and evaluated against other fault tolerance algorithms such as trust-based ACO, fault tolerance ACO, ACO without fault tolerance and ACO with fault tolerance in terms of total execution time, average latency, average makespan, throughput, execution success rate and load balancing. Experimental results showed that the proposed algorithm achieved the best performance in most aspects, and second best in terms of load balancing. The DAFTS achieved the smallest increase on execution time, average makespan and average latency by 7%, 11% and 5% respectively, and smallest decrease on throughput and execution success rate by 6.49% and 9% respectively as the failure rate increases. The DAFTS also achieved the smallest increment on execution time, average makespan and average latency by 5.8, 8.5 and 8.7 times respectively, and highest increase on throughput and highest execution success rate by 72.9% and 93.7% respectively as the number of jobs increases. The proposed algorithm can effectively overcome load balancing problems and increase execution success rates in distributed systems that are prone to faults. 2020 Thesis NonPeerReviewed text en https://etd.uum.edu.my/8715/1/Deposit%20Permission_s900382.pdf text en https://etd.uum.edu.my/8715/2/s900382_01.pdf text en https://etd.uum.edu.my/8715/3/s900382_references.docx Saufi, Bukhari (2020) An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing. Doctoral thesis, Universiti Utara Malaysia. |
institution |
Universiti Utara Malaysia |
building |
UUM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Utara Malaysia |
content_source |
UUM Electronic Theses |
url_provider |
http://etd.uum.edu.my/ |
language |
English English English |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Saufi, Bukhari An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing |
description |
Fault tolerance in grid computing allows the system to continue operate despite occurrence of failure. Most fault tolerance algorithms focus on fault handling techniques such as task reprocessing, checkpointing, task replication, penalty, and task
migration. Ant colony system (ACS), a variant of ant colony optimization (ACO), is one of the promising algorithms for fault tolerance due to its ability to adapt to both static and dynamic combinatorial optimization problems. However, ACS algorithm
does not consider the resource fitness during task scheduling which leads to poor load balancing and lower execution success rate. This research proposes dynamic ACS fault
tolerance with suspension (DAFTS) in grid computing that focuses on providing effective fault tolerance techniques to improve the execution success rate and load balancing. The proposed algorithm consists of dynamic evaporation rate, resource fitness-based scheduling process, enhanced pheromone update with trust factor and suspension, and checkpoint-based task reprocessing. The research framework consists of four phases which are identifying fault tolerance techniques, enhancing resource assignment and job scheduling, improving fault tolerance algorithm and, evaluating
the performance of the proposed algorithm. The proposed algorithm was developed in a simulated grid environment called GridSim and evaluated against other fault tolerance algorithms such as trust-based ACO, fault tolerance ACO, ACO without
fault tolerance and ACO with fault tolerance in terms of total execution time, average latency, average makespan, throughput, execution success rate and load balancing.
Experimental results showed that the proposed algorithm achieved the best performance in most aspects, and second best in terms of load balancing. The DAFTS achieved the smallest increase on execution time, average makespan and average latency by 7%, 11% and 5% respectively, and smallest decrease on throughput and execution success rate by 6.49% and 9% respectively as the failure rate increases. The DAFTS also achieved the smallest increment on execution time, average makespan and average latency by 5.8, 8.5 and 8.7 times respectively, and highest increase on throughput and highest execution success rate by 72.9% and 93.7% respectively as the number of jobs increases. The proposed algorithm can effectively overcome load balancing problems and increase execution success rates in distributed systems that are prone to faults. |
format |
Thesis |
author |
Saufi, Bukhari |
author_facet |
Saufi, Bukhari |
author_sort |
Saufi, Bukhari |
title |
An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing |
title_short |
An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing |
title_full |
An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing |
title_fullStr |
An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing |
title_full_unstemmed |
An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing |
title_sort |
enhanced ant colony system algorithm for dynamic fault tolerance in grid computing |
publishDate |
2020 |
url |
https://etd.uum.edu.my/8715/1/Deposit%20Permission_s900382.pdf https://etd.uum.edu.my/8715/2/s900382_01.pdf https://etd.uum.edu.my/8715/3/s900382_references.docx https://etd.uum.edu.my/8715/ |
_version_ |
1713202244271210496 |
score |
13.211869 |