Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers

Three major aspects of chemometrics have been investigated in this study namely Quantitative Structure-Activity Relationship (QSAR) and database mining, classification and multiblock methods. In the first analysis, 197 artemisinin compounds were divided into training set and test set together with s...

Full description

Saved in:
Bibliographic Details
Main Author: Jamaludin, Rosmahaida
Format: Thesis
Language:English
Published: 2015
Subjects:
Online Access:http://eprints.utm.my/id/eprint/61066/1/RosmahaidaJamaludinPFS2015.pdf
http://eprints.utm.my/id/eprint/61066/
http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:96405
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utm.61066
record_format eprints
spelling my.utm.610662017-10-08T08:57:57Z http://eprints.utm.my/id/eprint/61066/ Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers Jamaludin, Rosmahaida QD Chemistry Three major aspects of chemometrics have been investigated in this study namely Quantitative Structure-Activity Relationship (QSAR) and database mining, classification and multiblock methods. In the first analysis, 197 artemisinin compounds were divided into training set and test set together with structural descriptors generated by DRAGON 6.0 software had been used to develop three QSAR models. Statistics of the models were (r2/ rtest2) 0.790/0.853 for Forward Stepwise-Multiple Linear Regression (MLR), 0.807/0.789 for Genetic Algorithm (GA)-MLR and 0.795/0.811 for GA-Partial Least Square (PLS). The rigorously validated QSAR models were then applied to mine a chemical database which resulted in four potential new anti-malarial agents. The same artemisinin data set was then classified into active and less active compounds to develop reliable predictive classification models and to investigate the consequences of using various data splitting and data pre-processing methods on classification. Principal Component Analysis (PCA) and boundary plot had been utilized to visualize the four classifiers namely Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Linear Vector Quantization (LVQ) and Quadratic Discriminant Analysis (QDA). Kennard-Stone data splitting and standardization had produced better results in terms of percent correctly classified (% CC) compared to Duplex data-splitting and mean-centering. Moreover, LDA was found to be superior as compared to the other three classifiers with lower risk of over-fitting. Lastly, multiblock analysis methods such as Multiblock PLS and Consensus PCA have been implemented on polychlorinated diphenyl ethers (PCDEs) data set together with their respective descriptors blocked into three groups labelled as X 1D, X 2D, X 3D and a property block, Y which consists of log PL (Pa, 25°C), log K OW (25°C) and log SWL (mol/L, 25°C). Their performance were then compared to single block methods that is PLS and PCA. The PLS models of each descriptor block with respect to each property were statistically best-fitted and well predicted with rtrain2 values greater than 0.96 while the rtest values range from 0.86 to 0.98. It is interesting to note that the combination of the three descriptor blocks into a single block to produce Multiblock PLS superscores (MBSS) model which was superior than Multiblock PLS block-scores (MBBS) yielded slightly better rtrain2 value and significantly better prediction with higher rtest as compared to PLS model of individual descriptor block. In addition, three measures of block similarity such as Mantel Test, Rv coefficient and Procrustes analysis were used to investigate similarity and correlation between the blocks along with Monte Carlo simulations to determine their significance. Based on the similarity index between two blocks, X jD descriptors resembled Y block better while X 2D was more correlated to X 1D block. In short, the chemometric methods had been applied successfully on both data sets using various descriptors generated by DRAGON software and yielded promising results beneficial not only in chemometrics area but also in drug design. 2015-09 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/61066/1/RosmahaidaJamaludinPFS2015.pdf Jamaludin, Rosmahaida (2015) Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers. PhD thesis, Universiti Teknologi Malaysia, Faculty of Science. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:96405
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
language English
topic QD Chemistry
spellingShingle QD Chemistry
Jamaludin, Rosmahaida
Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers
description Three major aspects of chemometrics have been investigated in this study namely Quantitative Structure-Activity Relationship (QSAR) and database mining, classification and multiblock methods. In the first analysis, 197 artemisinin compounds were divided into training set and test set together with structural descriptors generated by DRAGON 6.0 software had been used to develop three QSAR models. Statistics of the models were (r2/ rtest2) 0.790/0.853 for Forward Stepwise-Multiple Linear Regression (MLR), 0.807/0.789 for Genetic Algorithm (GA)-MLR and 0.795/0.811 for GA-Partial Least Square (PLS). The rigorously validated QSAR models were then applied to mine a chemical database which resulted in four potential new anti-malarial agents. The same artemisinin data set was then classified into active and less active compounds to develop reliable predictive classification models and to investigate the consequences of using various data splitting and data pre-processing methods on classification. Principal Component Analysis (PCA) and boundary plot had been utilized to visualize the four classifiers namely Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Linear Vector Quantization (LVQ) and Quadratic Discriminant Analysis (QDA). Kennard-Stone data splitting and standardization had produced better results in terms of percent correctly classified (% CC) compared to Duplex data-splitting and mean-centering. Moreover, LDA was found to be superior as compared to the other three classifiers with lower risk of over-fitting. Lastly, multiblock analysis methods such as Multiblock PLS and Consensus PCA have been implemented on polychlorinated diphenyl ethers (PCDEs) data set together with their respective descriptors blocked into three groups labelled as X 1D, X 2D, X 3D and a property block, Y which consists of log PL (Pa, 25°C), log K OW (25°C) and log SWL (mol/L, 25°C). Their performance were then compared to single block methods that is PLS and PCA. The PLS models of each descriptor block with respect to each property were statistically best-fitted and well predicted with rtrain2 values greater than 0.96 while the rtest values range from 0.86 to 0.98. It is interesting to note that the combination of the three descriptor blocks into a single block to produce Multiblock PLS superscores (MBSS) model which was superior than Multiblock PLS block-scores (MBBS) yielded slightly better rtrain2 value and significantly better prediction with higher rtest as compared to PLS model of individual descriptor block. In addition, three measures of block similarity such as Mantel Test, Rv coefficient and Procrustes analysis were used to investigate similarity and correlation between the blocks along with Monte Carlo simulations to determine their significance. Based on the similarity index between two blocks, X jD descriptors resembled Y block better while X 2D was more correlated to X 1D block. In short, the chemometric methods had been applied successfully on both data sets using various descriptors generated by DRAGON software and yielded promising results beneficial not only in chemometrics area but also in drug design.
format Thesis
author Jamaludin, Rosmahaida
author_facet Jamaludin, Rosmahaida
author_sort Jamaludin, Rosmahaida
title Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers
title_short Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers
title_full Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers
title_fullStr Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers
title_full_unstemmed Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers
title_sort chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers
publishDate 2015
url http://eprints.utm.my/id/eprint/61066/1/RosmahaidaJamaludinPFS2015.pdf
http://eprints.utm.my/id/eprint/61066/
http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:96405
_version_ 1643655060097335296
score 13.211869