Machine Learning for Structure-Agnostic Chemical Analysis from Chromatographic Data

dc.contributor.authorLahouar, Adamen
dc.contributor.committeechairEldardiry, Hoda Mohameden
dc.contributor.committeememberIsaacman-VanWertz, Gabrielen
dc.contributor.committeememberYanardag Delul, Pinaren
dc.contributor.departmentComputer Science and#38; Applicationsen
dc.date.accessioned2026-02-04T09:00:30Zen
dc.date.available2026-02-04T09:00:30Zen
dc.date.issued2026-02-03en
dc.description.abstractEnvironmental monitoring relies heavily on gas chromatography (GC) to measure airborne contaminants such as volatile organic compounds (VOCs), yet many detected compounds lack structural or spectral references, limiting identification, property estimation, and quantitative analysis. This thesis investigates how machine learning (ML) can extract chemically meaningful information directly from chromatographic data to overcome these limitations. First, ML models are developed to establish a bidirectional relationship between chromatographic retention behavior on orthogonal GC phases and key physicochemical properties (vapor pressure, Henry's law constant, and solubility). Using XGBoost regression models trained on the NIST retention index database, a structure-agnostic "Index-to-Property" model predicts physicochemical properties from paired retention indices, while a complementary "Property-to-Index" model predicts retention behavior from known properties, achieving predictive performance up to R^2=0.98. Second, this work demonstrates that compound identity and concentration can be inferred directly from chromatographic peak shape, bypassing manual peak integration. ML classification and regression models trained on peaks from ambient atmospheric samples achieve 89% identification accuracy and a mean absolute error of 0.085 ppbv in concentration prediction. Together, these results show that machine learning can address key identification and data reduction challenges in environmental GC, enabling faster, structure-independent interpretation of complex mixtures.en
dc.description.abstractgeneralGas chromatography is an important method for monitoring air pollution, but many detected chemicals cannot be fully identified because reference information is missing or incomplete. This makes it difficult to understand what these compounds are, how they behave in the environment, and how much of them are present. This thesis explores how machine learning can help extract useful chemical information directly from chromatographic data. First, machine learning is used to relate chemical behavior in a gas chromatograph to important physical properties, allowing unknown compounds to be characterized without knowing their chemical structures. Second, machine learning is used to analyze the shape of chromatographic signals to identify compounds and estimate their concentrations automatically, reducing the need for time-consuming manual data processing. Overall, this research shows how machine learning can expand the capabilities of gas chromatography for environmental monitoring, improving both the speed and depth of chemical analysis over traditional methods.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:45632en
dc.identifier.urihttps://hdl.handle.net/10919/141131en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectGas Chromatographyen
dc.subjectMachine Learningen
dc.subjectCompound Classificationen
dc.titleMachine Learning for Structure-Agnostic Chemical Analysis from Chromatographic Dataen
dc.typeThesisen
thesis.degree.disciplineComputer Science & Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Lahouar_A_T_2026.pdf
Size:
2.9 MB
Format:
Adobe Portable Document Format

Collections