Automated Assessment of Student-written Tests Based on Defect-detection Capability

dc.contributor.authorShams, Zaliaen
dc.contributor.committeechairEdwards, Stephen H.en
dc.contributor.committeememberPerez-Quinonez, Manuel A.en
dc.contributor.committeememberOffutt, Jeffen
dc.contributor.committeememberKafura, Dennis G.en
dc.contributor.committeememberTilevich, Elien
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2015-05-06T08:00:26Zen
dc.date.available2015-05-06T08:00:26Zen
dc.date.issued2015-05-05en
dc.description.abstractSoftware testing is important, but judging whether a set of software tests is effective is difficult. This problem also appears in the classroom as educators more frequently include software testing activities in programming assignments. The most common measures used to assess student-written software tests are coverage criteria—tracking how much of the student’s code (in terms of statements, or branches) is exercised by the corresponding tests. However, coverage criteria have limitations and sometimes overestimate the true quality of the tests. This dissertation investigates alternative measures of test quality based on how many defects the tests can detect either from code written by other students—all-pairs execution—or from artificially injected changes—mutation analysis. We also investigate a new potential measure called checked code coverage that calculates coverage from the dynamic backward slices of test oracles, i.e. all statements that contribute to the checked result of any test. Adoption of these alternative approaches in automated classroom grading systems require overcoming a number of technical challenges. This research addresses these challenges and experimentally compares different methods in terms of how well they predict defect-detection capabilities of student-written tests when run against over 36,500 known, authentic, human-written errors. For data collection, we use CS2 assignments and evaluate students’ tests with 10 different measures—all-pairs execution, mutation testing with four different sets of mutation operators, checked code coverage, and four coverage criteria. Experimental results encompassing 1,971,073 test runs show that all-pairs execution is the most accurate predictor of the underlying defect-detection capability of a test suite. The second best predictor is mutation analysis with the statement deletion operator. Further, no strong correlation was found between defect-detection capability and coverage measures.en
dc.description.degreePh. D.en
dc.format.mediumETDen
dc.identifier.othervt_gsexam:5068en
dc.identifier.urihttp://hdl.handle.net/10919/52024en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectSoftware Testingen
dc.subjectAutomated Assessmenten
dc.subjectAll-pairs Executionen
dc.subjectMutation Testingen
dc.subjectCoverage Criteriaen
dc.subjectDefect-detection Capabilityen
dc.titleAutomated Assessment of Student-written Tests Based on Defect-detection Capabilityen
dc.typeDissertationen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePh. D.en

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Shams_Z_D_2015.pdf
Size:
2.93 MB
Format:
Adobe Portable Document Format