Machine Learning for Structure-Agnostic Chemical Analysis from Chromatographic Data

Lahouar, Adam

Machine Learning for Structure-Agnostic Chemical Analysis from Chromatographic Data

Files

Lahouar_A_T_2026.pdf (2.9 MB)

Downloads: 138

Date

2026-02-03

Authors

Lahouar, Adam

Publisher

Virginia Tech

Abstract

Environmental monitoring relies heavily on gas chromatography (GC) to measure airborne contaminants such as volatile organic compounds (VOCs), yet many detected compounds lack structural or spectral references, limiting identification, property estimation, and quantitative analysis. This thesis investigates how machine learning (ML) can extract chemically meaningful information directly from chromatographic data to overcome these limitations. First, ML models are developed to establish a bidirectional relationship between chromatographic retention behavior on orthogonal GC phases and key physicochemical properties (vapor pressure, Henry's law constant, and solubility). Using XGBoost regression models trained on the NIST retention index database, a structure-agnostic "Index-to-Property" model predicts physicochemical properties from paired retention indices, while a complementary "Property-to-Index" model predicts retention behavior from known properties, achieving predictive performance up to R^2=0.98. Second, this work demonstrates that compound identity and concentration can be inferred directly from chromatographic peak shape, bypassing manual peak integration. ML classification and regression models trained on peaks from ambient atmospheric samples achieve 89% identification accuracy and a mean absolute error of 0.085 ppbv in concentration prediction. Together, these results show that machine learning can address key identification and data reduction challenges in environmental GC, enabling faster, structure-independent interpretation of complex mixtures.

Keywords

Gas Chromatography, Machine Learning, Compound Classification

Persistent link

https://hdl.handle.net/10919/141131

Collections

Masters Theses

Full item page

Machine Learning for Structure-Agnostic Chemical Analysis from Chromatographic Data

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections