Browsing by Author "Kim, Sungwoo"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- Comprehensive detection of analytes in large chromatographic datasets by coupling factor analysis with a decision treeKim, Sungwoo; Lerner, Brian M.; Sueper, Donna T.; Isaacman-VanWertz, Gabriel (Copernicus, 2022-09-05)Environmental samples typically contain hundreds or thousands of unique organic compounds, and even minor components may provide valuable insight into their sources and transformations. To understand atmospheric processes, individual components are frequently identified and quantified using gas chromatography-mass spectrometry. However, due to the complexity and frequently variable nature of such data, data reduction is a significant bottleneck in analysis. Consequently, only a subset of known analytes is often reported for a dataset, and large amounts of potentially useful data are discarded. We present an automated approach of cataloging and potentially identifying all analytes in a large chromatographic dataset and demonstrate the utility of our approach in an analysis of ambient aerosols. We use a coupled factor analysis-decision tree approach to deconvolute peaks and comprehensively catalog nearly all analytes in a dataset. Positive matrix factorization (PMF) of small subsections of multiple chromatograms is applied to extract factors that represent chromatographic profiles and mass spectra of potential analytes, in which peaks are detected. A decision tree based on peak parameters (e.g., location, width, and height), relative ratios of those parameters, peak shape, noise, retention time, and mass spectrum is applied to discard erroneous peaks and combine peaks determined to represent the same analyte. With our approach, all analytes within the small section of the chromatogram are cataloged, and the process is repeated for overlapping sections across the chromatogram, generating a complete list of the retention times and estimated mass spectra of all peaks in a dataset. We validate this approach using samples of known compounds and demonstrate the separation of poorly resolved peaks with similar mass spectra and the resolution of peaks that appear in only a fraction of chromatograms. As a case study, this method is applied to a complex real-world dataset of the composition of atmospheric particles, in which more than 1100 unique chromatographic peaks are resolved, and the corresponding peak information along with mass spectra are cataloged.
- A Statistical Methods-Based Novel Approach for Fully Automated Analysis of Chromatographic DataKim, Sungwoo (Virginia Tech, 2024-12-04)Atmospheric samples are complex mixtures that contain thousands of volatile organic compounds (VOCs) with diverse physicochemical properties and multiple isomers. These compounds can interact with nitrogen oxides, leading to the formation of ozone and particulate matter, which have detrimental effects on human health. Therefore, it is essential to apply effective analytical methods to obtain valuable information about the sources and transformation processes of these samples. Gas chromatography coupled with mass spectrometry (GC-MS) is a widely used method for the analysis of these complex mixtures due to its sensitivity and resolution. However, it presents significant challenges in data reduction and analyte identification due to the complexity and variability of atmospheric data. Traditional processing methods of large GC-MS datasets are highly time-consuming and may lead to the loss of potentially valuable information from relatively weak signals and incomplete characterization of compounds. This study addresses these challenges. An automated approach is developed that catalogs and identifies nearly all analytes in large chromatographic datasets by combining factor analysis and a decision tree approach to de-convolute peaks. This approach was applied to data from the GoAmazon 2014/5 campaign and cataloged more than 1000 unique analytes. A novel method is then introduced to automatically identify quantification ions for single-ion chromatogram (SIC) based peak fitting and integration to generate time series of analytes. Through these combined approaches, a complex GC-MS dataset of atmospheric composition is reduced and processed fully automatically. Additionally, a machine learning-based dimensionality reduction algorithm was applied to the generated time series data for systematic characterization and categorization of both identified and unidentified compounds, clustering them into 8 distinct groups based on their temporal variation. These data are then used to generate fundamental insight into the atmospheric processes impact composition. This analysis aimed to elucidate the effects of meteorological conditions on these compounds, particularly the impact of wet deposition through precipitation scavenging on gas- and particle-phase oxygenated compounds. Hourly removal rates for all analytes were estimated by examining the impacts of precipitation on their concentration.