Novel Preprocessing and Normalization Methods for Analysis of GC/LC-MS Data
We introduce new methods for preprocessing and normalization of data acquired by gas/liquid chromatography coupled with mass spectrometry (GC/LC-MS). Normalization is desired prior to subsequent statistical analysis to adjust variabilities in ion intensities that are not caused by biological differences. There are different sources of experimental bias including variabilities in sample collection, sample storage, poor experimental design, noise, etc. Also, instrument variability in experiments involving a large number of runs leads to a significant drift in intensity measurements. We propose new normalization methods based on bootstrapping, Gaussian process regression, non-negative matrix factorization (NMF), and Bayesian hierarchical models. These methods model the bias by borrowing information across runs and features. Another novel aspect is utilizing scan-level data to improve the accuracy of quantification. We evaluated the performance of our method using simulated and experimental data. In comparison with several existing methods, the proposed methods yielded significant improvement. Gas chromatography coupled with mass spectrometry (GC-MS) is one of the technologies widely used for qualitative and quantitative analysis of small molecules. In particular, GC coupled to single quadrupole MS can be utilized for targeted analysis by selected ion monitoring (SIM). However, to our knowledge, there are no software tools specifically designed for analysis of GS-SIM-MS data. We introduce SIMAT, a new R package for quantitative analysis of the levels of targeted analytes. SIMAT provides guidance in choosing fragments for a list of targets. This is accomplished through an optimization algorithm that has the capability to select the most appropriate fragments from overlapping peaks based on a pre-specified library of background analytes. The tool also allows visualization of the total ion chromatogram (TIC) of runs and extracted ion chromatogram (EIC) of analytes of interest. Moreover, retention index (RI) calibration can be performed and raw GC-SIM-MS data can be imported in netCDF or NIST mass spectral library (MSL) formats. We evaluated the performance of SIMAT using several experimental data sets. Our results demonstrate that SIMAT performs better than AMDIS and MetaboliteDetector in terms of finding the correct targets in the acquired GC-SIM-MS data and estimating their relative levels.