Staff View: A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data

A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data

Introduction: The metabolome of a biological system is affected by multiple factors including factor of interest (e.g. metabolic perturbation due to disease) and unwanted factors or factors which are not primarily the focus of the study (e.g. batch effect, gender, and level of physical activity). Re...

Full description

Saved in:

Bibliographic Details
Main Authors:	Wang, W., Cheng, K. K., Deng, L., Xu, J., Shen, G., Griffin, J. L., Dong, J.
Format:	Article
Published:	Springer New York LLC 2017
Subjects:	TP Chemical technology
Online Access:	http://eprints.utm.my/id/eprint/76957/ https://www.scopus.com/inward/record.uri?eid=2-s2.0-85006757598&doi=10.1007%2fs11306-016-1146-y&partnerID=40&md5=7c26f1e3daaa1340ae08c70be1798666
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.utm.76957
record_format	eprints
spelling	my.utm.769572018-04-30T14:27:19Z http://eprints.utm.my/id/eprint/76957/ A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data Wang, W. Cheng, K. K. Deng, L. Xu, J. Shen, G. Griffin, J. L. Dong, J. TP Chemical technology Introduction: The metabolome of a biological system is affected by multiple factors including factor of interest (e.g. metabolic perturbation due to disease) and unwanted factors or factors which are not primarily the focus of the study (e.g. batch effect, gender, and level of physical activity). Removal of these unwanted data variations is advantageous, as the unwanted variations may complicate biological interpretation of the data. Objectives: We aim to develop a new unwanted variations elimination (UVE) method called clustering-based unwanted residuals elimination (CURE) to reduce metabolic variation caused by unwanted/hidden factors in metabolomic data. Methods: A mean-centered metabolomic dataset can be viewed as a combination of a studied factor matrix and a residual matrix. The CURE method assumes that the residual should be normally distributed if it only contains inter-individual variation. However, if the residual forms multiple clusters in feature subspace of principal components analysis or partial least squares discriminant analysis, the residual may contain variation due to unwanted factors. This unwanted variation is removed by doing K-means data clustering and removal of means for each cluster from the residuals. The process is iterated until the residual no longer forms multiple clusters in feature subspace. Results: Three simulated datasets and a human metabolomic dataset were used to demonstrate the performance of the proposed CURE method. CURE was found able to remove most of the variations caused by unwanted factors, while preserving inter-individual variation between samples. Conclusion: The CURE method can effectively remove unwanted data variation, and can serve as an alternative UVE method for metabolomic data. Springer New York LLC 2017 Article PeerReviewed Wang, W. and Cheng, K. K. and Deng, L. and Xu, J. and Shen, G. and Griffin, J. L. and Dong, J. (2017) A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data. Metabolomics, 13 (1). ISSN 1573-3882 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85006757598&doi=10.1007%2fs11306-016-1146-y&partnerID=40&md5=7c26f1e3daaa1340ae08c70be1798666 DOI:10.1007/s11306-016-1146-y
institution	Universiti Teknologi Malaysia
building	UTM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Malaysia
content_source	UTM Institutional Repository
url_provider	http://eprints.utm.my/
topic	TP Chemical technology
spellingShingle	TP Chemical technology Wang, W. Cheng, K. K. Deng, L. Xu, J. Shen, G. Griffin, J. L. Dong, J. A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data
description	Introduction: The metabolome of a biological system is affected by multiple factors including factor of interest (e.g. metabolic perturbation due to disease) and unwanted factors or factors which are not primarily the focus of the study (e.g. batch effect, gender, and level of physical activity). Removal of these unwanted data variations is advantageous, as the unwanted variations may complicate biological interpretation of the data. Objectives: We aim to develop a new unwanted variations elimination (UVE) method called clustering-based unwanted residuals elimination (CURE) to reduce metabolic variation caused by unwanted/hidden factors in metabolomic data. Methods: A mean-centered metabolomic dataset can be viewed as a combination of a studied factor matrix and a residual matrix. The CURE method assumes that the residual should be normally distributed if it only contains inter-individual variation. However, if the residual forms multiple clusters in feature subspace of principal components analysis or partial least squares discriminant analysis, the residual may contain variation due to unwanted factors. This unwanted variation is removed by doing K-means data clustering and removal of means for each cluster from the residuals. The process is iterated until the residual no longer forms multiple clusters in feature subspace. Results: Three simulated datasets and a human metabolomic dataset were used to demonstrate the performance of the proposed CURE method. CURE was found able to remove most of the variations caused by unwanted factors, while preserving inter-individual variation between samples. Conclusion: The CURE method can effectively remove unwanted data variation, and can serve as an alternative UVE method for metabolomic data.
format	Article
author	Wang, W. Cheng, K. K. Deng, L. Xu, J. Shen, G. Griffin, J. L. Dong, J.
author_facet	Wang, W. Cheng, K. K. Deng, L. Xu, J. Shen, G. Griffin, J. L. Dong, J.
author_sort	Wang, W.
title	A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data
title_short	A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data
title_full	A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data
title_fullStr	A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data
title_full_unstemmed	A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data
title_sort	clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data
publisher	Springer New York LLC
publishDate	2017
url	http://eprints.utm.my/id/eprint/76957/ https://www.scopus.com/inward/record.uri?eid=2-s2.0-85006757598&doi=10.1007%2fs11306-016-1146-y&partnerID=40&md5=7c26f1e3daaa1340ae08c70be1798666
_version_	1643657457559404544
score	13.211869

A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data

Similar Items