Comparative assessment and novel strategy on methods for imputing proteomics data

dc.contributor.authorShen, Minjieen
dc.contributor.authorChang, Yi-Tanen
dc.contributor.authorWu, Chiung-Tingen
dc.contributor.authorParker, Sarah J.en
dc.contributor.authorSaylor, Georgiaen
dc.contributor.authorWang, Yizhien
dc.contributor.authorYu, Guoqiangen
dc.contributor.authorVan Eyk, Jennifer E.en
dc.contributor.authorClarke, Roberten
dc.contributor.authorHerrington, David M.en
dc.contributor.authorWang, Yueen
dc.date.accessioned2022-02-19T20:02:22Zen
dc.date.available2022-02-19T20:02:22Zen
dc.date.issued2022-01-20en
dc.date.updated2022-02-19T20:02:16Zen
dc.description.abstractMissing values are a major issue in quantitative proteomics analysis. While many methods have been developed for imputing missing values in high-throughput proteomics data, a comparative assessment of imputation accuracy remains inconclusive, mainly because mechanisms contributing to true missing values are complex and existing evaluation methodologies are imperfect. Moreover, few studies have provided an outlook of future methodological development. We first re-evaluate the performance of eight representative methods targeting three typical missing mechanisms. These methods are compared on both simulated and masked missing values embedded within real proteomics datasets, and performance is evaluated using three quantitative measures. We then introduce fused regularization matrix factorization, a low-rank global matrix factorization framework, capable of integrating local similarity derived from additional data types. We also explore a biologically-inspired latent variable modeling strategy—convex analysis of mixtures—for missing value imputation and present preliminary experimental results. While some winners emerged from our comparative assessment, the evaluation is intrinsically imperfect because performance is evaluated indirectly on artificial missing or masked values not authentic missing values. Nevertheless, we show that our fused regularization matrix factorization provides a novel incorporation of external and local information, and the exploratory implementation of convex analysis of mixtures presents a biologically plausible new approach.en
dc.description.versionPublished versionen
dc.format.extentPages 1067en
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.1038/s41598-022-04938-0en
dc.identifier.eissn2045-2322en
dc.identifier.issn2045-2322en
dc.identifier.issue1en
dc.identifier.orcidWang, Yue [0000-0002-1788-1102]en
dc.identifier.other10.1038/s41598-022-04938-0 (PII)en
dc.identifier.pmid35058491en
dc.identifier.urihttp://hdl.handle.net/10919/108765en
dc.identifier.volume12en
dc.language.isoenen
dc.relation.urihttps://www.ncbi.nlm.nih.gov/pubmed/35058491en
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.titleComparative assessment and novel strategy on methods for imputing proteomics dataen
dc.title.serialScientific Reportsen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten
dc.type.otherJournal Articleen
dcterms.dateAccepted2022-01-04en
pubs.organisational-group/Virginia Techen
pubs.organisational-group/Virginia Tech/Engineeringen
pubs.organisational-group/Virginia Tech/Engineering/Electrical and Computer Engineeringen
pubs.organisational-group/Virginia Tech/Faculty of Health Sciencesen
pubs.organisational-group/Virginia Tech/All T&R Facultyen
pubs.organisational-group/Virginia Tech/Engineering/COE T&R Facultyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Comparative assessment and novel strategy on methods for imputing proteomics data.pdf
Size:
3.05 MB
Format:
Adobe Portable Document Format
Description:
Published version