Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers
Three major aspects of chemometrics have been investigated in this study namely Quantitative Structure-Activity Relationship (QSAR) and database mining, classification and multiblock methods. In the first analysis, 197 artemisinin compounds were divided into training set and test set together with s...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/61066/1/RosmahaidaJamaludinPFS2015.pdf http://eprints.utm.my/id/eprint/61066/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:96405 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utm.61066 |
---|---|
record_format |
eprints |
spelling |
my.utm.610662017-10-08T08:57:57Z http://eprints.utm.my/id/eprint/61066/ Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers Jamaludin, Rosmahaida QD Chemistry Three major aspects of chemometrics have been investigated in this study namely Quantitative Structure-Activity Relationship (QSAR) and database mining, classification and multiblock methods. In the first analysis, 197 artemisinin compounds were divided into training set and test set together with structural descriptors generated by DRAGON 6.0 software had been used to develop three QSAR models. Statistics of the models were (r2/ rtest2) 0.790/0.853 for Forward Stepwise-Multiple Linear Regression (MLR), 0.807/0.789 for Genetic Algorithm (GA)-MLR and 0.795/0.811 for GA-Partial Least Square (PLS). The rigorously validated QSAR models were then applied to mine a chemical database which resulted in four potential new anti-malarial agents. The same artemisinin data set was then classified into active and less active compounds to develop reliable predictive classification models and to investigate the consequences of using various data splitting and data pre-processing methods on classification. Principal Component Analysis (PCA) and boundary plot had been utilized to visualize the four classifiers namely Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Linear Vector Quantization (LVQ) and Quadratic Discriminant Analysis (QDA). Kennard-Stone data splitting and standardization had produced better results in terms of percent correctly classified (% CC) compared to Duplex data-splitting and mean-centering. Moreover, LDA was found to be superior as compared to the other three classifiers with lower risk of over-fitting. Lastly, multiblock analysis methods such as Multiblock PLS and Consensus PCA have been implemented on polychlorinated diphenyl ethers (PCDEs) data set together with their respective descriptors blocked into three groups labelled as X 1D, X 2D, X 3D and a property block, Y which consists of log PL (Pa, 25°C), log K OW (25°C) and log SWL (mol/L, 25°C). Their performance were then compared to single block methods that is PLS and PCA. The PLS models of each descriptor block with respect to each property were statistically best-fitted and well predicted with rtrain2 values greater than 0.96 while the rtest values range from 0.86 to 0.98. It is interesting to note that the combination of the three descriptor blocks into a single block to produce Multiblock PLS superscores (MBSS) model which was superior than Multiblock PLS block-scores (MBBS) yielded slightly better rtrain2 value and significantly better prediction with higher rtest as compared to PLS model of individual descriptor block. In addition, three measures of block similarity such as Mantel Test, Rv coefficient and Procrustes analysis were used to investigate similarity and correlation between the blocks along with Monte Carlo simulations to determine their significance. Based on the similarity index between two blocks, X jD descriptors resembled Y block better while X 2D was more correlated to X 1D block. In short, the chemometric methods had been applied successfully on both data sets using various descriptors generated by DRAGON software and yielded promising results beneficial not only in chemometrics area but also in drug design. 2015-09 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/61066/1/RosmahaidaJamaludinPFS2015.pdf Jamaludin, Rosmahaida (2015) Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers. PhD thesis, Universiti Teknologi Malaysia, Faculty of Science. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:96405 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
language |
English |
topic |
QD Chemistry |
spellingShingle |
QD Chemistry Jamaludin, Rosmahaida Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers |
description |
Three major aspects of chemometrics have been investigated in this study namely Quantitative Structure-Activity Relationship (QSAR) and database mining, classification and multiblock methods. In the first analysis, 197 artemisinin compounds were divided into training set and test set together with structural descriptors generated by DRAGON 6.0 software had been used to develop three QSAR models. Statistics of the models were (r2/ rtest2) 0.790/0.853 for Forward Stepwise-Multiple Linear Regression (MLR), 0.807/0.789 for Genetic Algorithm (GA)-MLR and 0.795/0.811 for GA-Partial Least Square (PLS). The rigorously validated QSAR models were then applied to mine a chemical database which resulted in four potential new anti-malarial agents. The same artemisinin data set was then classified into active and less active compounds to develop reliable predictive classification models and to investigate the consequences of using various data splitting and data pre-processing methods on classification. Principal Component Analysis (PCA) and boundary plot had been utilized to visualize the four classifiers namely Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Linear Vector Quantization (LVQ) and Quadratic Discriminant Analysis (QDA). Kennard-Stone data splitting and standardization had produced better results in terms of percent correctly classified (% CC) compared to Duplex data-splitting and mean-centering. Moreover, LDA was found to be superior as compared to the other three classifiers with lower risk of over-fitting. Lastly, multiblock analysis methods such as Multiblock PLS and Consensus PCA have been implemented on polychlorinated diphenyl ethers (PCDEs) data set together with their respective descriptors blocked into three groups labelled as X 1D, X 2D, X 3D and a property block, Y which consists of log PL (Pa, 25°C), log K OW (25°C) and log SWL (mol/L, 25°C). Their performance were then compared to single block methods that is PLS and PCA. The PLS models of each descriptor block with respect to each property were statistically best-fitted and well predicted with rtrain2 values greater than 0.96 while the rtest values range from 0.86 to 0.98. It is interesting to note that the combination of the three descriptor blocks into a single block to produce Multiblock PLS superscores (MBSS) model which was superior than Multiblock PLS block-scores (MBBS) yielded slightly better rtrain2 value and significantly better prediction with higher rtest as compared to PLS model of individual descriptor block. In addition, three measures of block similarity such as Mantel Test, Rv coefficient and Procrustes analysis were used to investigate similarity and correlation between the blocks along with Monte Carlo simulations to determine their significance. Based on the similarity index between two blocks, X jD descriptors resembled Y block better while X 2D was more correlated to X 1D block. In short, the chemometric methods had been applied successfully on both data sets using various descriptors generated by DRAGON software and yielded promising results beneficial not only in chemometrics area but also in drug design. |
format |
Thesis |
author |
Jamaludin, Rosmahaida |
author_facet |
Jamaludin, Rosmahaida |
author_sort |
Jamaludin, Rosmahaida |
title |
Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers |
title_short |
Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers |
title_full |
Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers |
title_fullStr |
Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers |
title_full_unstemmed |
Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers |
title_sort |
chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers |
publishDate |
2015 |
url |
http://eprints.utm.my/id/eprint/61066/1/RosmahaidaJamaludinPFS2015.pdf http://eprints.utm.my/id/eprint/61066/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:96405 |
_version_ |
1643655060097335296 |
score |
13.211869 |