Generalizing predictions to unseen sequencing profiles via deep generative models

dc.contributor.authorOh, Minen
dc.contributor.authorZhang, Liqingen
dc.date.accessioned2022-06-16T12:40:44Zen
dc.date.available2022-06-16T12:40:44Zen
dc.date.issued2022-05-03en
dc.description.abstractPredictive models trained on sequencing profiles often fail to achieve expected performance when externally validated on unseen profiles. While many factors such as batch effects, small data sets, and technical errors contribute to the gap between source and unseen data distributions, it is a challenging problem to generalize the predictive models across studies without any prior knowledge of the unseen data distribution. Here, this study proposes DeepBioGen, a sequencing profile augmentation procedure that characterizes visual patterns of sequencing profiles, generates realistic profiles based on a deep generative model capturing the patterns, and generalizes the subsequent classifiers. DeepBioGen outperforms other methods in terms of enhancing the generalizability of the prediction models on unseen data. The generalized classifiers surpass the state-of-the-art method, evaluated on RNA sequencing tumor expression profiles for anti-PD1 therapy response prediction and WGS human gut microbiome profiles for type 2 diabetes diagnosis.en
dc.description.notesThis work is partially supported by VT's OASF support.en
dc.description.sponsorshipVT's OASFen
dc.description.versionPublished versionen
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.1038/s41598-022-11363-wen
dc.identifier.issn2045-2322en
dc.identifier.issue1en
dc.identifier.other7151en
dc.identifier.pmid35504956en
dc.identifier.urihttp://hdl.handle.net/10919/110801en
dc.identifier.volume12en
dc.language.isoenen
dc.publisherNature Portfolioen
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.subjectvalidationen
dc.subjectmetagenomeen
dc.titleGeneralizing predictions to unseen sequencing profiles via deep generative modelsen
dc.title.serialScientific Reportsen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
s41598-022-11363-w.pdf
Size:
1.55 MB
Format:
Adobe Portable Document Format
Description:
Published version