Comprehensive detection of analytes in large chromatographic datasets by coupling factor analysis with a decision tree

dc.contributor.authorKim, Sungwooen
dc.contributor.authorLerner, Brian M.en
dc.contributor.authorSueper, Donna T.en
dc.contributor.authorIsaacman-VanWertz, Gabrielen
dc.date.accessioned2022-10-07T12:48:04Zen
dc.date.available2022-10-07T12:48:04Zen
dc.date.issued2022-09-05en
dc.description.abstractEnvironmental samples typically contain hundreds or thousands of unique organic compounds, and even minor components may provide valuable insight into their sources and transformations. To understand atmospheric processes, individual components are frequently identified and quantified using gas chromatography-mass spectrometry. However, due to the complexity and frequently variable nature of such data, data reduction is a significant bottleneck in analysis. Consequently, only a subset of known analytes is often reported for a dataset, and large amounts of potentially useful data are discarded. We present an automated approach of cataloging and potentially identifying all analytes in a large chromatographic dataset and demonstrate the utility of our approach in an analysis of ambient aerosols. We use a coupled factor analysis-decision tree approach to deconvolute peaks and comprehensively catalog nearly all analytes in a dataset. Positive matrix factorization (PMF) of small subsections of multiple chromatograms is applied to extract factors that represent chromatographic profiles and mass spectra of potential analytes, in which peaks are detected. A decision tree based on peak parameters (e.g., location, width, and height), relative ratios of those parameters, peak shape, noise, retention time, and mass spectrum is applied to discard erroneous peaks and combine peaks determined to represent the same analyte. With our approach, all analytes within the small section of the chromatogram are cataloged, and the process is repeated for overlapping sections across the chromatogram, generating a complete list of the retention times and estimated mass spectra of all peaks in a dataset. We validate this approach using samples of known compounds and demonstrate the separation of poorly resolved peaks with similar mass spectra and the resolution of peaks that appear in only a fraction of chromatograms. As a case study, this method is applied to a complex real-world dataset of the composition of atmospheric particles, in which more than 1100 unique chromatographic peaks are resolved, and the corresponding peak information along with mass spectra are cataloged.en
dc.description.notesThis work was supported by the National Oceanic and Atmospheric Administration Small Business Innovative Research Program (WC133R18CN0064 and NA21OAR0210294). We thank Chenyang Bi for assistance with generating laboratory data and Allen Goldstein for sharing ambient data.en
dc.description.sponsorshipNational Oceanic and Atmospheric Administration Small Business Innovative Research Program [WC133R18CN0064, NA21OAR0210294]en
dc.description.versionPublished versionen
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.5194/amt-15-5061-2022en
dc.identifier.eissn1867-8548en
dc.identifier.issn1867-1381en
dc.identifier.issue17en
dc.identifier.urihttp://hdl.handle.net/10919/112099en
dc.identifier.volume15en
dc.language.isoenen
dc.publisherCopernicusen
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.subject2-dimensional gas-chromatographyen
dc.subjectpositive matrix factorizationen
dc.subjectmass-spectral deconvolutionen
dc.subjectorganic-compoundsen
dc.subjectidentification systemen
dc.subjectpart 1en
dc.subjectaerosolen
dc.subjectinstrumenten
dc.subjectresolutionen
dc.subjectmodelen
dc.titleComprehensive detection of analytes in large chromatographic datasets by coupling factor analysis with a decision treeen
dc.title.serialAtmospheric Measurement Techniquesen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
amt-15-5061-2022.pdf
Size:
4.66 MB
Format:
Adobe Portable Document Format
Description:
Published version