Machine Learning for Structure-Agnostic Chemical Analysis from Chromatographic Data
| dc.contributor.author | Lahouar, Adam | en |
| dc.contributor.committeechair | Eldardiry, Hoda Mohamed | en |
| dc.contributor.committeemember | Isaacman-VanWertz, Gabriel | en |
| dc.contributor.committeemember | Yanardag Delul, Pinar | en |
| dc.contributor.department | Computer Science and#38; Applications | en |
| dc.date.accessioned | 2026-02-04T09:00:30Z | en |
| dc.date.available | 2026-02-04T09:00:30Z | en |
| dc.date.issued | 2026-02-03 | en |
| dc.description.abstract | Environmental monitoring relies heavily on gas chromatography (GC) to measure airborne contaminants such as volatile organic compounds (VOCs), yet many detected compounds lack structural or spectral references, limiting identification, property estimation, and quantitative analysis. This thesis investigates how machine learning (ML) can extract chemically meaningful information directly from chromatographic data to overcome these limitations. First, ML models are developed to establish a bidirectional relationship between chromatographic retention behavior on orthogonal GC phases and key physicochemical properties (vapor pressure, Henry's law constant, and solubility). Using XGBoost regression models trained on the NIST retention index database, a structure-agnostic "Index-to-Property" model predicts physicochemical properties from paired retention indices, while a complementary "Property-to-Index" model predicts retention behavior from known properties, achieving predictive performance up to R^2=0.98. Second, this work demonstrates that compound identity and concentration can be inferred directly from chromatographic peak shape, bypassing manual peak integration. ML classification and regression models trained on peaks from ambient atmospheric samples achieve 89% identification accuracy and a mean absolute error of 0.085 ppbv in concentration prediction. Together, these results show that machine learning can address key identification and data reduction challenges in environmental GC, enabling faster, structure-independent interpretation of complex mixtures. | en |
| dc.description.abstractgeneral | Gas chromatography is an important method for monitoring air pollution, but many detected chemicals cannot be fully identified because reference information is missing or incomplete. This makes it difficult to understand what these compounds are, how they behave in the environment, and how much of them are present. This thesis explores how machine learning can help extract useful chemical information directly from chromatographic data. First, machine learning is used to relate chemical behavior in a gas chromatograph to important physical properties, allowing unknown compounds to be characterized without knowing their chemical structures. Second, machine learning is used to analyze the shape of chromatographic signals to identify compounds and estimate their concentrations automatically, reducing the need for time-consuming manual data processing. Overall, this research shows how machine learning can expand the capabilities of gas chromatography for environmental monitoring, improving both the speed and depth of chemical analysis over traditional methods. | en |
| dc.description.degree | Master of Science | en |
| dc.format.medium | ETD | en |
| dc.identifier.other | vt_gsexam:45632 | en |
| dc.identifier.uri | https://hdl.handle.net/10919/141131 | en |
| dc.language.iso | en | en |
| dc.publisher | Virginia Tech | en |
| dc.rights | In Copyright | en |
| dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
| dc.subject | Gas Chromatography | en |
| dc.subject | Machine Learning | en |
| dc.subject | Compound Classification | en |
| dc.title | Machine Learning for Structure-Agnostic Chemical Analysis from Chromatographic Data | en |
| dc.type | Thesis | en |
| thesis.degree.discipline | Computer Science & Applications | en |
| thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
| thesis.degree.level | masters | en |
| thesis.degree.name | Master of Science | en |
Files
Original bundle
1 - 1 of 1