Performance analysis of bacterial genome assemblers using illumina next generation sequencing data / Nur ‘ Ain Mohd Ishak

The advancement of next generation sequencing (NGS) technology has revolutionized the field of genomic and genetic studies. As compared to conventional methods, NGS generate comprehensive genomic data at a fraction of the cost with a higher percentage of accuracy. One of the processing and analyzing...

Full description

Saved in:
Bibliographic Details
Main Author: Nur ‘ Ain , Mohd Ishak
Format: Thesis
Published: 2020
Subjects:
Online Access:http://studentsrepo.um.edu.my/12724/1/Nur_'ain.pdf
http://studentsrepo.um.edu.my/12724/2/Nur_%E2%80%98ain.pdf
http://studentsrepo.um.edu.my/12724/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The advancement of next generation sequencing (NGS) technology has revolutionized the field of genomic and genetic studies. As compared to conventional methods, NGS generate comprehensive genomic data at a fraction of the cost with a higher percentage of accuracy. One of the processing and analyzing NGS data is genome assembly. De novo assembly is a process of assembling short reads into contiguous sections of sequence without a reference which is different with conventional mapping technique. De Bruijn graph is one of the assembly algorithms that are widely used for short reads sequences produced from NGS platforms. In this study, the performance of four de novo assemblers (SPAdes, ABySS, Velvet and MaSuRCA) is reported, in which variants of de Brujin graph algorithms are applied, using genomic data generated by the Illumina sequencing platform. The computational performance regarding the assemblers running time were compared. The assembled contigs and scaffolds were also evaluated based on several qualities specifically for their length and the contiguity of the assembly using ABySS-fac. Results showed that on single-end data sets, MaSuRCA, and SPAdes produced generally the best results among all the four assemblers with highest percentage of contigs that were equal or longer than 500 bp, highest total base pairs, highest N50 and the lowest L50 for most assemblers. For paired-end data sets, Velvet are suitable to assemble all the seven bacteria genome sequences. This comparative study will advance the current knowledge of de novo genome assembly as it is the first step toward characterizing and revealing whole genomic information. In addition, this work provides a practical guideline that could aid researchers in identifying the appropriate assembler(s) for their research projects.