A Statistical Methods-Based Novel Approach for Fully Automated Analysis of Chromatographic Data

dc.contributor.authorKim, Sungwooen
dc.contributor.committeechairIsaacman-VanWertz, Gabrielen
dc.contributor.committeememberForoutan, Hoseinen
dc.contributor.committeememberMarr, Linsey C.en
dc.contributor.committeememberDietrich, Andrea M.en
dc.contributor.departmentCivil and Environmental Engineeringen
dc.date.accessioned2024-12-05T09:00:10Zen
dc.date.available2024-12-05T09:00:10Zen
dc.date.issued2024-12-04en
dc.description.abstractAtmospheric samples are complex mixtures that contain thousands of volatile organic compounds (VOCs) with diverse physicochemical properties and multiple isomers. These compounds can interact with nitrogen oxides, leading to the formation of ozone and particulate matter, which have detrimental effects on human health. Therefore, it is essential to apply effective analytical methods to obtain valuable information about the sources and transformation processes of these samples. Gas chromatography coupled with mass spectrometry (GC-MS) is a widely used method for the analysis of these complex mixtures due to its sensitivity and resolution. However, it presents significant challenges in data reduction and analyte identification due to the complexity and variability of atmospheric data. Traditional processing methods of large GC-MS datasets are highly time-consuming and may lead to the loss of potentially valuable information from relatively weak signals and incomplete characterization of compounds. This study addresses these challenges. An automated approach is developed that catalogs and identifies nearly all analytes in large chromatographic datasets by combining factor analysis and a decision tree approach to de-convolute peaks. This approach was applied to data from the GoAmazon 2014/5 campaign and cataloged more than 1000 unique analytes. A novel method is then introduced to automatically identify quantification ions for single-ion chromatogram (SIC) based peak fitting and integration to generate time series of analytes. Through these combined approaches, a complex GC-MS dataset of atmospheric composition is reduced and processed fully automatically. Additionally, a machine learning-based dimensionality reduction algorithm was applied to the generated time series data for systematic characterization and categorization of both identified and unidentified compounds, clustering them into 8 distinct groups based on their temporal variation. These data are then used to generate fundamental insight into the atmospheric processes impact composition. This analysis aimed to elucidate the effects of meteorological conditions on these compounds, particularly the impact of wet deposition through precipitation scavenging on gas- and particle-phase oxygenated compounds. Hourly removal rates for all analytes were estimated by examining the impacts of precipitation on their concentration.en
dc.description.abstractgeneralAtmospheric samples are made up of thousands of different volatile organic compounds (VOCs) with varying chemical properties and multiple forms, making them highly complex. These compounds can interact with nitrogen oxides, leading to the formation of ozone and particulate matter, which can have serious health effects. To better understand the sources and transformations of these compounds, it is crucial to use effective analytical methods. Gas chromatography coupled with mass spectrometry (GC-MS) is a powerful tool commonly used to analyze these complex mixtures due to its high sensitivity and ability to separate different compounds. However, the complex nature of atmospheric data poses challenges in analyzing and identifying the vast number of compounds present. Traditional methods for processing large GC-MS datasets are often time-consuming and may overlook potentially important but weak signals, resulting in incomplete identification of compounds. This study addresses these challenges by developing an automated method that efficiently catalogs and identifies almost all compounds in large GC-MS datasets. By combining factor analysis with a decision tree approach, the new method can separate overlapping signals and identify distinct compounds. This approach was applied to data from the GoAmazon 2014/5 campaign, successfully cataloging over 1,000 unique analytes. Additionally, a novel technique was introduced to automatically identify the best ions for quantifying each analyte and generate concentration time series data. The processed data were further analyzed using a machine learning algorithm to group both known and unknown analytes into 8 distinct categories based on their behavior over time. This analysis provided key insights into how atmospheric processes, especially weather conditions such as rainfall, affect the composition of these analytes. The study estimated the rate at which different analytes were removed from the atmosphere by precipitation, shedding light on the impact of wet deposition on gas- and particle-phase compounds.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:41544en
dc.identifier.urihttps://hdl.handle.net/10919/123737en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectGas chromatographyen
dc.subjectmass spectrometryen
dc.subjectpositive matrix factorizationen
dc.subjectdimensionality reductionen
dc.subjectspherical k-meansen
dc.subjectwet depositionen
dc.titleA Statistical Methods-Based Novel Approach for Fully Automated Analysis of Chromatographic Dataen
dc.typeDissertationen
thesis.degree.disciplineCivil Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 1 of 1
Name:
Kim_S_D_2024.pdf
Size:
5.73 MB
Format:
Adobe Portable Document Format