Modifying the Asynchronous Jacobi Method for Data Corruption Resilience

dc.contributor.authorVogl, Christopher J.en
dc.contributor.authorAtkins, Zachary R.en
dc.contributor.authorFox, Alysonen
dc.contributor.authorMiedlar, Agnieszkaen
dc.contributor.authorPonce, Colinen
dc.date.accessioned2025-02-05T15:49:15Zen
dc.date.available2025-02-05T15:49:15Zen
dc.date.issued2024-10-09en
dc.description.abstractMoving scientific computation from high-performance computing (HPC) and cloud computing (CC) environments to devices on the edge, i.e., physically near instruments of interest, has received tremendous interest in recent years. Such edge computing environments can operate on data in situ, offering enticing benefits over data aggregation to HPC and CC facilities that include avoiding costs of transmission, increased data privacy, and real-time data analysis. Because of the inherent unreliability of edge computing environments, new fault-tolerant approaches must be developed before the benefits of edge computing can be realized. Motivated by algorithm-based fault tolerance, a variant of the asynchronous Jacobi (ASJ) method is developed that achieves resilience to data corruption by rejecting solution approximations from neighbor devices according to a bound derived from convergence theory. Numerical results on a two-dimensional Poisson problem show that the new rejection criterion, along with a novel approximation to the shortest path length on which the criterion depends, restores convergence for the ASJ variant in the presence of certain types data corruption. Numerical results are obtained for when the singular values in the analytic bound are approximated. Additional linear systems are also explored, one with a more dense sparsity pattern and one that includes advection. All results indicate that successful resilience to data corruption depends on whether the bound tightens fast enough to reject corrupted data before the iteration evolution deviates significantly from that predicted by the convergence theory defining the bound. This observation generalizes to future work on algorithm-based fault tolerance for other asynchronous algorithms, including upcoming approaches that leverage Krylov subspaces.en
dc.description.versionPublished versionen
dc.format.extentPages A3258-A3281en
dc.format.extent24 page(s)en
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.1137/23M1605648en
dc.identifier.eissn1095-7197en
dc.identifier.issn1064-8275en
dc.identifier.issue5en
dc.identifier.orcidMiedlar, Agnieszka [0000-0002-2995-7426]en
dc.identifier.urihttps://hdl.handle.net/10919/124504en
dc.identifier.volume46en
dc.language.isoenen
dc.publisherSociety for Industrial and Applied Mathematicsen
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.subjectdistributed computingen
dc.subjectasynchronous methodsen
dc.subjectJacobien
dc.subjectdata corruptionen
dc.titleModifying the Asynchronous Jacobi Method for Data Corruption Resilienceen
dc.title.serialSIAM Journal on Scientific Computingen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten
dc.type.otherArticleen
dc.type.otherJournalen
pubs.organisational-groupVirginia Techen
pubs.organisational-groupVirginia Tech/Scienceen
pubs.organisational-groupVirginia Tech/Science/Mathematicsen
pubs.organisational-groupVirginia Tech/All T&R Facultyen
pubs.organisational-groupVirginia Tech/Science/COS T&R Facultyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
VogAFMP24_Modyfying_the_Asynchronous_Jacobi_Method.pdf
Size:
2.89 MB
Format:
Adobe Portable Document Format
Description:
Published version
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Plain Text
Description: