Two-layer SVM classifier for remote protein homology detection and fold recognition
Advances in molecular biology in the past years have yielded an unprecedented amount of new protein sequences. The resulting sequences describe a protein in terms of the amino acids that constitute them without structural or functional protein information. Therefore, remote protein homology detectio...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2009
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/11488/6/MohdHilmiMudaMFSKSM2009.pdf http://eprints.utm.my/id/eprint/11488/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utm.11488 |
---|---|
record_format |
eprints |
spelling |
my.utm.114882017-09-18T10:11:02Z http://eprints.utm.my/id/eprint/11488/ Two-layer SVM classifier for remote protein homology detection and fold recognition Muda, Mohd. Hilmi QA75 Electronic computers. Computer science Advances in molecular biology in the past years have yielded an unprecedented amount of new protein sequences. The resulting sequences describe a protein in terms of the amino acids that constitute them without structural or functional protein information. Therefore, remote protein homology detection and fold recognition algorithms have become increasingly important to detect the structural homology in proteins where there are small or no similarity at all in the sequences compared. However, it is a challenging task to detect and classify this similarity with more biological meaning in the context of Structural Classification of Proteins (SCOP) database. This study presents a new computational framework based on two-layer SVM classifier that uses protein sequences as a primary source. The first layer is used to detect up to superfamily level in the SCOP hierarchy using one-versus-all SVM binary classifiers and the Bio-kernel function. The second layer uses SVM with fold recognition codes and the profile-string kernel to leverage the unlabeled data and to detect up to fold level in the SCOP hierarchy. The proposed framework is tested using SCOP 1.53, 1.67 and 1.73 datasets and the results are evaluated using mean Receiver Operating Characteristics (ROC) and mean Median Rate of False Positives (MRFP). In terms of mean ROC, the experiment shows 4.19% improvement in SCOP 1.53 dataset, 4.75% in SCOP 1.67 dataset and 4.03% in SCOP 1.73 dataset compared to the existing SVM-based classifiers and kernel functions. This result shows that the proposed framework is capable to perform well using different versions of datasets and has outperformed existing methods, which implies the reliability of the framework. 2009-11 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/11488/6/MohdHilmiMudaMFSKSM2009.pdf Muda, Mohd. Hilmi (2009) Two-layer SVM classifier for remote protein homology detection and fold recognition. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computer Science and Information Systems. |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
language |
English |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Muda, Mohd. Hilmi Two-layer SVM classifier for remote protein homology detection and fold recognition |
description |
Advances in molecular biology in the past years have yielded an unprecedented amount of new protein sequences. The resulting sequences describe a protein in terms of the amino acids that constitute them without structural or functional protein information. Therefore, remote protein homology detection and fold recognition algorithms have become increasingly important to detect the structural homology in proteins where there are small or no similarity at all in the sequences compared. However, it is a challenging task to detect and classify this similarity with more biological meaning in the context of Structural Classification of Proteins (SCOP) database. This study presents a new computational framework based on two-layer SVM classifier that uses protein sequences as a primary source. The first layer is used to detect up to superfamily level in the SCOP hierarchy using one-versus-all SVM binary classifiers and the Bio-kernel function. The second layer uses SVM with fold recognition codes and the profile-string kernel to leverage the unlabeled data and to detect up to fold level in the SCOP hierarchy. The proposed framework is tested using SCOP 1.53, 1.67 and 1.73 datasets and the results are evaluated using mean Receiver Operating Characteristics (ROC) and mean Median Rate of False Positives (MRFP). In terms of mean ROC, the experiment shows 4.19% improvement in SCOP 1.53 dataset, 4.75% in SCOP 1.67 dataset and 4.03% in SCOP 1.73 dataset compared to the existing SVM-based classifiers and kernel functions. This result shows that the proposed framework is capable to perform well using different versions of datasets and has outperformed existing methods, which implies the reliability of the framework. |
format |
Thesis |
author |
Muda, Mohd. Hilmi |
author_facet |
Muda, Mohd. Hilmi |
author_sort |
Muda, Mohd. Hilmi |
title |
Two-layer SVM classifier for remote protein homology detection and fold recognition |
title_short |
Two-layer SVM classifier for remote protein homology detection and fold recognition |
title_full |
Two-layer SVM classifier for remote protein homology detection and fold recognition |
title_fullStr |
Two-layer SVM classifier for remote protein homology detection and fold recognition |
title_full_unstemmed |
Two-layer SVM classifier for remote protein homology detection and fold recognition |
title_sort |
two-layer svm classifier for remote protein homology detection and fold recognition |
publishDate |
2009 |
url |
http://eprints.utm.my/id/eprint/11488/6/MohdHilmiMudaMFSKSM2009.pdf http://eprints.utm.my/id/eprint/11488/ |
_version_ |
1643645695970770944 |
score |
13.211869 |