De Novo assembly of an unknown geminivirus / Nurul Jannah Binti Mat @ Mohamad

Next-generation sequencing (NGS) also known as high throughput sequencing is now fast and cheap enough to be considered part of the toolbox for investigating the unknown virus. Illumina Genome Analyzer is one of the developed next-generation sequencing platforms that produce a significant larger vol...

Full description

Saved in:
Bibliographic Details
Main Author: Mat @ Mohamad, Nurul Jannah
Format: Thesis
Published: 2014
Subjects:
Online Access:http://studentsrepo.um.edu.my/6420/1/Research_report%2DSGJ130002.pdf
http://studentsrepo.um.edu.my/6420/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.um.stud.6420
record_format eprints
spelling my.um.stud.64202016-10-22T09:16:35Z De Novo assembly of an unknown geminivirus / Nurul Jannah Binti Mat @ Mohamad Mat @ Mohamad, Nurul Jannah Q Science (General) Next-generation sequencing (NGS) also known as high throughput sequencing is now fast and cheap enough to be considered part of the toolbox for investigating the unknown virus. Illumina Genome Analyzer is one of the developed next-generation sequencing platforms that produce a significant larger volume of sequence data. The short sequence reads generated from Illumina Genome Analyzer can be used to perform de novo assembly. Therefore, this study was conducted to perform de novo assembly of an unknown geminivirus using the sequence reads generated from Illumina Genome Analyzer. In this study, the de novo assembly was carried out using SOAPdenovo and it indicates that only one scaffold (C11095) that mapped into the geminivirus genomes. After the scaffold output was obtained, the gene was predicted using GeneMark.hmm. There were 5 open reading frames (ORFs) predicted as gene. The function of each predicted gene was annotated using three different annotation tools, InterPro, Gene Ontology (GO) and UniProt. For example, from the InterPro result, the gene 1 encodes the geminivirus AL3 coat protein, while the UniProt result shows that the gene 1 encodes the replication enhancement protein and the GO shows that the gene 1 was involved in the viral process (biological process). In this study, the predictive genes were compared with the geminivirus genomes using BRIG (BLAST Ring Image Generator). The BRIG image shows that the large sequence of the unknown geminivirus was missing between 1000 bp until 1300 bp. From the genes comparison result, it indicates the similarity between the unknown geminivirus and the geminivirus genomes where all the geminiviruses encode the coat protein and replication-associated protein. The differences between the unknown geminivirus and the geminivirus genomes were the unknown geminivirus encodes the replication enhancement protein (gene 1), the hypothetical protein (gene 3) and the glyoxylate carboligase (gene 5). The phylogenetic result shows that the geminiviruses can be classified into the East Asia (China, Taiwan, and Japan) and the Southeast Asia (Malaysia, Indonesia, Philippines and Vietnam) viruses. The unknown geminivirus (candidate virus) was located in the Southeast Asia group. This phylogenetic tree indicates that the unknown geminivirus share common ancestor with Tobacco leaf curl Indonesia virus C1, V2, V1 genes for replication-associated protein, putative V2 protein, coat protein, partial and complete cds. The results of the phylogenetic tree suggest that the unknown geminivirus could be a Southeast Asia strain and it could be attack tobacco plants. The main point of this study was carried out to show the process in identifying an unknown sequence reads generated from Illumina Genome Analyzer. 2014 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/6420/1/Research_report%2DSGJ130002.pdf Mat @ Mohamad, Nurul Jannah (2014) De Novo assembly of an unknown geminivirus / Nurul Jannah Binti Mat @ Mohamad. Masters thesis, University of Malaya. http://studentsrepo.um.edu.my/6420/
institution Universiti Malaya
building UM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaya
content_source UM Student Repository
url_provider http://studentsrepo.um.edu.my/
topic Q Science (General)
spellingShingle Q Science (General)
Mat @ Mohamad, Nurul Jannah
De Novo assembly of an unknown geminivirus / Nurul Jannah Binti Mat @ Mohamad
description Next-generation sequencing (NGS) also known as high throughput sequencing is now fast and cheap enough to be considered part of the toolbox for investigating the unknown virus. Illumina Genome Analyzer is one of the developed next-generation sequencing platforms that produce a significant larger volume of sequence data. The short sequence reads generated from Illumina Genome Analyzer can be used to perform de novo assembly. Therefore, this study was conducted to perform de novo assembly of an unknown geminivirus using the sequence reads generated from Illumina Genome Analyzer. In this study, the de novo assembly was carried out using SOAPdenovo and it indicates that only one scaffold (C11095) that mapped into the geminivirus genomes. After the scaffold output was obtained, the gene was predicted using GeneMark.hmm. There were 5 open reading frames (ORFs) predicted as gene. The function of each predicted gene was annotated using three different annotation tools, InterPro, Gene Ontology (GO) and UniProt. For example, from the InterPro result, the gene 1 encodes the geminivirus AL3 coat protein, while the UniProt result shows that the gene 1 encodes the replication enhancement protein and the GO shows that the gene 1 was involved in the viral process (biological process). In this study, the predictive genes were compared with the geminivirus genomes using BRIG (BLAST Ring Image Generator). The BRIG image shows that the large sequence of the unknown geminivirus was missing between 1000 bp until 1300 bp. From the genes comparison result, it indicates the similarity between the unknown geminivirus and the geminivirus genomes where all the geminiviruses encode the coat protein and replication-associated protein. The differences between the unknown geminivirus and the geminivirus genomes were the unknown geminivirus encodes the replication enhancement protein (gene 1), the hypothetical protein (gene 3) and the glyoxylate carboligase (gene 5). The phylogenetic result shows that the geminiviruses can be classified into the East Asia (China, Taiwan, and Japan) and the Southeast Asia (Malaysia, Indonesia, Philippines and Vietnam) viruses. The unknown geminivirus (candidate virus) was located in the Southeast Asia group. This phylogenetic tree indicates that the unknown geminivirus share common ancestor with Tobacco leaf curl Indonesia virus C1, V2, V1 genes for replication-associated protein, putative V2 protein, coat protein, partial and complete cds. The results of the phylogenetic tree suggest that the unknown geminivirus could be a Southeast Asia strain and it could be attack tobacco plants. The main point of this study was carried out to show the process in identifying an unknown sequence reads generated from Illumina Genome Analyzer.
format Thesis
author Mat @ Mohamad, Nurul Jannah
author_facet Mat @ Mohamad, Nurul Jannah
author_sort Mat @ Mohamad, Nurul Jannah
title De Novo assembly of an unknown geminivirus / Nurul Jannah Binti Mat @ Mohamad
title_short De Novo assembly of an unknown geminivirus / Nurul Jannah Binti Mat @ Mohamad
title_full De Novo assembly of an unknown geminivirus / Nurul Jannah Binti Mat @ Mohamad
title_fullStr De Novo assembly of an unknown geminivirus / Nurul Jannah Binti Mat @ Mohamad
title_full_unstemmed De Novo assembly of an unknown geminivirus / Nurul Jannah Binti Mat @ Mohamad
title_sort de novo assembly of an unknown geminivirus / nurul jannah binti mat @ mohamad
publishDate 2014
url http://studentsrepo.um.edu.my/6420/1/Research_report%2DSGJ130002.pdf
http://studentsrepo.um.edu.my/6420/
_version_ 1738505913819463680
score 13.211869