A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data

Introduction: The metabolome of a biological system is affected by multiple factors including factor of interest (e.g. metabolic perturbation due to disease) and unwanted factors or factors which are not primarily the focus of the study (e.g. batch effect, gender, and level of physical activity). Re...

Full description

Saved in:
Bibliographic Details
Main Authors: Wang, W., Cheng, K. K., Deng, L., Xu, J., Shen, G., Griffin, J. L., Dong, J.
Format: Article
Published: Springer New York LLC 2017
Subjects:
Online Access:http://eprints.utm.my/id/eprint/76957/
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85006757598&doi=10.1007%2fs11306-016-1146-y&partnerID=40&md5=7c26f1e3daaa1340ae08c70be1798666
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utm.76957
record_format eprints
spelling my.utm.769572018-04-30T14:27:19Z http://eprints.utm.my/id/eprint/76957/ A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data Wang, W. Cheng, K. K. Deng, L. Xu, J. Shen, G. Griffin, J. L. Dong, J. TP Chemical technology Introduction: The metabolome of a biological system is affected by multiple factors including factor of interest (e.g. metabolic perturbation due to disease) and unwanted factors or factors which are not primarily the focus of the study (e.g. batch effect, gender, and level of physical activity). Removal of these unwanted data variations is advantageous, as the unwanted variations may complicate biological interpretation of the data. Objectives: We aim to develop a new unwanted variations elimination (UVE) method called clustering-based unwanted residuals elimination (CURE) to reduce metabolic variation caused by unwanted/hidden factors in metabolomic data. Methods: A mean-centered metabolomic dataset can be viewed as a combination of a studied factor matrix and a residual matrix. The CURE method assumes that the residual should be normally distributed if it only contains inter-individual variation. However, if the residual forms multiple clusters in feature subspace of principal components analysis or partial least squares discriminant analysis, the residual may contain variation due to unwanted factors. This unwanted variation is removed by doing K-means data clustering and removal of means for each cluster from the residuals. The process is iterated until the residual no longer forms multiple clusters in feature subspace. Results: Three simulated datasets and a human metabolomic dataset were used to demonstrate the performance of the proposed CURE method. CURE was found able to remove most of the variations caused by unwanted factors, while preserving inter-individual variation between samples. Conclusion: The CURE method can effectively remove unwanted data variation, and can serve as an alternative UVE method for metabolomic data. Springer New York LLC 2017 Article PeerReviewed Wang, W. and Cheng, K. K. and Deng, L. and Xu, J. and Shen, G. and Griffin, J. L. and Dong, J. (2017) A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data. Metabolomics, 13 (1). ISSN 1573-3882 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85006757598&doi=10.1007%2fs11306-016-1146-y&partnerID=40&md5=7c26f1e3daaa1340ae08c70be1798666 DOI:10.1007/s11306-016-1146-y
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
topic TP Chemical technology
spellingShingle TP Chemical technology
Wang, W.
Cheng, K. K.
Deng, L.
Xu, J.
Shen, G.
Griffin, J. L.
Dong, J.
A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data
description Introduction: The metabolome of a biological system is affected by multiple factors including factor of interest (e.g. metabolic perturbation due to disease) and unwanted factors or factors which are not primarily the focus of the study (e.g. batch effect, gender, and level of physical activity). Removal of these unwanted data variations is advantageous, as the unwanted variations may complicate biological interpretation of the data. Objectives: We aim to develop a new unwanted variations elimination (UVE) method called clustering-based unwanted residuals elimination (CURE) to reduce metabolic variation caused by unwanted/hidden factors in metabolomic data. Methods: A mean-centered metabolomic dataset can be viewed as a combination of a studied factor matrix and a residual matrix. The CURE method assumes that the residual should be normally distributed if it only contains inter-individual variation. However, if the residual forms multiple clusters in feature subspace of principal components analysis or partial least squares discriminant analysis, the residual may contain variation due to unwanted factors. This unwanted variation is removed by doing K-means data clustering and removal of means for each cluster from the residuals. The process is iterated until the residual no longer forms multiple clusters in feature subspace. Results: Three simulated datasets and a human metabolomic dataset were used to demonstrate the performance of the proposed CURE method. CURE was found able to remove most of the variations caused by unwanted factors, while preserving inter-individual variation between samples. Conclusion: The CURE method can effectively remove unwanted data variation, and can serve as an alternative UVE method for metabolomic data.
format Article
author Wang, W.
Cheng, K. K.
Deng, L.
Xu, J.
Shen, G.
Griffin, J. L.
Dong, J.
author_facet Wang, W.
Cheng, K. K.
Deng, L.
Xu, J.
Shen, G.
Griffin, J. L.
Dong, J.
author_sort Wang, W.
title A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data
title_short A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data
title_full A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data
title_fullStr A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data
title_full_unstemmed A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data
title_sort clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data
publisher Springer New York LLC
publishDate 2017
url http://eprints.utm.my/id/eprint/76957/
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85006757598&doi=10.1007%2fs11306-016-1146-y&partnerID=40&md5=7c26f1e3daaa1340ae08c70be1798666
_version_ 1643657457559404544
score 13.211869