Zhan-long Mei:
|
With the advancement of the MS techniques, the throughput and coverage of the untargeted
metabolomics have been greatly improved, making it a powerful tool for screening altered
metabolites associated with phenotypes or simulations. Tens of thousands of metabolites could be
detected in one run and quantitively measured as the peak areas of the MS features. Due to the nature
of the quantitation, the precision of the quantitation is affected by a variety of factors, such as the
batch effects, the ionization efficiency, and the dilution effects. To combat these unwanted variations,
several data processing and pretreatment steps are needed, and multiple algorithms have been
developed. However, until now, there was no widely accepted workflow for the untargeted
metabolomics data analysis. On the one hand, limited options of data analysis algorithms were
implemented in the integrated pipelines without systematic evaluation of their performance. On the
other hand, the users were encouraged to try different data processing algorithms to choose the best
one.
In this study, I developed an expert analysis system, MetaboPro, for untargeted metabolomics data,
in which multiple approaches were implemented, and systematic evaluations were provided. I showed
that there is no single solution to the missing value imputation, batch effects removal, sample
normalization, transformation, and scaling, because the performances of different approaches for
these analyses varied a lot, and no approaches always outstand the other on different datasets. I
reviewed and compared the existing evaluation criteria for each step of analysis and implemented the
commonly used data analysis approaches and evaluation criteria into MetaboPro. Besides, I also
developed a stepwise imputation strategy by classifying the missing values into three classes and
imputing according to their origin, which greatly improved the imputation accuracy.
This expert analysis system may serve the community in at least three ways: firstly, MetaboPro
provides guidance of the necessity of each data processing and how to evaluate the processing
outcome step by step, which benefits new users to better understand the data analysis. Secondly, this
integrated tool greatly improves the robustness of the statistical outcome, leading to precise
interpreting of the phenotypes. Finally, this study will advance the development of untargeted
metabolomics data analysis and speed up the formation of a widely accepted workflow.