Machine Learning for Structure-Agnostic Chemical Analysis from Chromatographic Data

Lahouar, Adam

Machine Learning for Structure-Agnostic Chemical Analysis from Chromatographic Data

dc.contributor.author	Lahouar, Adam	en
dc.contributor.committeechair	Eldardiry, Hoda Mohamed	en
dc.contributor.committeemember	Isaacman-VanWertz, Gabriel	en
dc.contributor.committeemember	Yanardag Delul, Pinar	en
dc.contributor.department	Computer Science and#38; Applications	en
dc.date.accessioned	2026-02-04T09:00:30Z	en
dc.date.available	2026-02-04T09:00:30Z	en
dc.date.issued	2026-02-03	en
dc.description.abstract	Environmental monitoring relies heavily on gas chromatography (GC) to measure airborne contaminants such as volatile organic compounds (VOCs), yet many detected compounds lack structural or spectral references, limiting identification, property estimation, and quantitative analysis. This thesis investigates how machine learning (ML) can extract chemically meaningful information directly from chromatographic data to overcome these limitations. First, ML models are developed to establish a bidirectional relationship between chromatographic retention behavior on orthogonal GC phases and key physicochemical properties (vapor pressure, Henry's law constant, and solubility). Using XGBoost regression models trained on the NIST retention index database, a structure-agnostic "Index-to-Property" model predicts physicochemical properties from paired retention indices, while a complementary "Property-to-Index" model predicts retention behavior from known properties, achieving predictive performance up to R^2=0.98. Second, this work demonstrates that compound identity and concentration can be inferred directly from chromatographic peak shape, bypassing manual peak integration. ML classification and regression models trained on peaks from ambient atmospheric samples achieve 89% identification accuracy and a mean absolute error of 0.085 ppbv in concentration prediction. Together, these results show that machine learning can address key identification and data reduction challenges in environmental GC, enabling faster, structure-independent interpretation of complex mixtures.	en
dc.description.abstractgeneral	Gas chromatography is an important method for monitoring air pollution, but many detected chemicals cannot be fully identified because reference information is missing or incomplete. This makes it difficult to understand what these compounds are, how they behave in the environment, and how much of them are present. This thesis explores how machine learning can help extract useful chemical information directly from chromatographic data. First, machine learning is used to relate chemical behavior in a gas chromatograph to important physical properties, allowing unknown compounds to be characterized without knowing their chemical structures. Second, machine learning is used to analyze the shape of chromatographic signals to identify compounds and estimate their concentrations automatically, reducing the need for time-consuming manual data processing. Overall, this research shows how machine learning can expand the capabilities of gas chromatography for environmental monitoring, improving both the speed and depth of chemical analysis over traditional methods.	en
dc.description.degree	Master of Science	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:45632	en
dc.identifier.uri	https://hdl.handle.net/10919/141131	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Gas Chromatography	en
dc.subject	Machine Learning	en
dc.subject	Compound Classification	en
dc.title	Machine Learning for Structure-Agnostic Chemical Analysis from Chromatographic Data	en
dc.type	Thesis	en
thesis.degree.discipline	Computer Science & Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	masters	en
thesis.degree.name	Master of Science	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Lahouar_A_T_2026.pdf
Size:: 2.9 MB
Format:: Adobe Portable Document Format

Download

Collections

Masters Theses