GEMS: A Fault Tolerant Grid Job Management System

dc.contributor.authorTadepalli, Sriram Satishen
dc.contributor.committeechairRibbens, Calvin J.en
dc.contributor.committeememberKafura, Dennis G.en
dc.contributor.committeememberVaradarajan, Srinidhien
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2011-08-06T14:44:47Zen
dc.date.adate2004-01-08en
dc.date.available2011-08-06T14:44:47Zen
dc.date.issued2003-12-19en
dc.date.rdate2005-01-08en
dc.date.sdate2003-12-29en
dc.description.abstractThe Grid environments are inherently unstable. Resources join and leave the environment without any prior notification. Application fault detection, checkpointing and restart is of foremost importance in the Grid environments. The need for fault tolerance is especially acute for large parallel applications since the failure rate grows with the number of processors and the duration of the computation. A Grid job management system hides the heterogeneity of the Grid and the complexity of the Grid protocols from the user. The user submits a job to the Grid job management system and it finds the appropriate resource, submits the job and transfers the output files to the user upon job completion. However, current Grid job management systems do not detect application failures. The goal of this research is to develop a Grid job management system that can efficiently detect application failures. Failed jobs are restarted either on the same resource or the job is migrated to another resource and restarted. The research also aims to identify the role of local resource managers in the fault detection and migration of Grid applications.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.otheretd-12292003-134023en
dc.identifier.sourceurlhttp://scholar.lib.vt.edu/theses/available/etd-12292003-134023en
dc.identifier.urihttp://hdl.handle.net/10919/9661en
dc.publisherVirginia Techen
dc.relation.haspartthesis.pdfen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectfault toleranceen
dc.subjectgrid computingen
dc.subjectgrid job management systemsen
dc.subjectlocal resource manageren
dc.subjectjob migrationen
dc.titleGEMS: A Fault Tolerant Grid Job Management Systemen
dc.typeThesisen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis.pdf
Size:
320.47 KB
Format:
Adobe Portable Document Format

Collections