Protein sequences classification based on weighting scheme

We present a new technique to recognize remote protein homologies that rely on combining probabilistic modeling and supervised learning in high-dimensional feature spaces. The main novelty of our technique is the method of constructing feature vectors using Hidden Markov Model and the combination of...

Full description

Saved in:
Bibliographic Details
Main Authors: Zaki, N. M., Deris, Safaai, Md Illias, Rosli
Format: Article
Language:en
Published: Assumption University 2005
Subjects:
Online Access:http://eprints.utm.my/5576/1/N.M.Zaki2005_ProteinSequencesClassificationBasedOn.pdf
http://eprints.utm.my/5576/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1845471417203687424
author Zaki, N. M.
Deris, Safaai
Md Illias, Rosli
author_facet Zaki, N. M.
Deris, Safaai
Md Illias, Rosli
author_sort Zaki, N. M.
building UTM Library
collection Institutional Repository
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
continent Asia
country Malaysia
description We present a new technique to recognize remote protein homologies that rely on combining probabilistic modeling and supervised learning in high-dimensional feature spaces. The main novelty of our technique is the method of constructing feature vectors using Hidden Markov Model and the combination of this representation with a classifier capable of learning in very sparse high-dimensional spaces. Each feature vector records the sensitivity of each protein domain to a previously learned set of sub-sequences (strings). Unlike other previous methods, our method takes in consideration the conserved and non-conserved regions. The system subsequently utilizes Support Vector Machines (SVM) classifiers to learn the boundaries between structural protein classes. Experiments show that this method, which we call the String Weighting Scheme-SVM (SWS-SVM) method, significantly improves on previous methods for the classification of protein domains based on remote homologies. Our method is then compared to five existing homology detection methods.
format Article
id my.utm.eprints-5576
institution Universiti Teknologi Malaysia
language en
publishDate 2005
publisher Assumption University
record_format eprints
spelling my.utm.eprints-55762010-06-01T15:32:30Z http://eprints.utm.my/5576/ Protein sequences classification based on weighting scheme Zaki, N. M. Deris, Safaai Md Illias, Rosli T Technology (General) We present a new technique to recognize remote protein homologies that rely on combining probabilistic modeling and supervised learning in high-dimensional feature spaces. The main novelty of our technique is the method of constructing feature vectors using Hidden Markov Model and the combination of this representation with a classifier capable of learning in very sparse high-dimensional spaces. Each feature vector records the sensitivity of each protein domain to a previously learned set of sub-sequences (strings). Unlike other previous methods, our method takes in consideration the conserved and non-conserved regions. The system subsequently utilizes Support Vector Machines (SVM) classifiers to learn the boundaries between structural protein classes. Experiments show that this method, which we call the String Weighting Scheme-SVM (SWS-SVM) method, significantly improves on previous methods for the classification of protein domains based on remote homologies. Our method is then compared to five existing homology detection methods. Assumption University 2005 Article PeerReviewed application/pdf en http://eprints.utm.my/5576/1/N.M.Zaki2005_ProteinSequencesClassificationBasedOn.pdf Zaki, N. M. and Deris, Safaai and Md Illias, Rosli (2005) Protein sequences classification based on weighting scheme. International Journal of Computer, the Internet and Management, 13 (1). pp. 50-60.
spellingShingle T Technology (General)
Zaki, N. M.
Deris, Safaai
Md Illias, Rosli
Protein sequences classification based on weighting scheme
title Protein sequences classification based on weighting scheme
title_full Protein sequences classification based on weighting scheme
title_fullStr Protein sequences classification based on weighting scheme
title_full_unstemmed Protein sequences classification based on weighting scheme
title_short Protein sequences classification based on weighting scheme
title_sort protein sequences classification based on weighting scheme
topic T Technology (General)
url http://eprints.utm.my/5576/1/N.M.Zaki2005_ProteinSequencesClassificationBasedOn.pdf
http://eprints.utm.my/5576/
url_provider http://eprints.utm.my/