Automated Assessment of Student-written Tests Based on Defect-detection Capability

Shams, Zalia

Automated Assessment of Student-written Tests Based on Defect-detection Capability

dc.contributor.author	Shams, Zalia	en
dc.contributor.committeechair	Edwards, Stephen H.	en
dc.contributor.committeemember	Perez-Quinonez, Manuel A.	en
dc.contributor.committeemember	Offutt, Jeff	en
dc.contributor.committeemember	Kafura, Dennis G.	en
dc.contributor.committeemember	Tilevich, Eli	en
dc.contributor.department	Computer Science	en
dc.date.accessioned	2015-05-06T08:00:26Z	en
dc.date.available	2015-05-06T08:00:26Z	en
dc.date.issued	2015-05-05	en
dc.description.abstract	Software testing is important, but judging whether a set of software tests is effective is difficult. This problem also appears in the classroom as educators more frequently include software testing activities in programming assignments. The most common measures used to assess student-written software tests are coverage criteria—tracking how much of the student’s code (in terms of statements, or branches) is exercised by the corresponding tests. However, coverage criteria have limitations and sometimes overestimate the true quality of the tests. This dissertation investigates alternative measures of test quality based on how many defects the tests can detect either from code written by other students—all-pairs execution—or from artificially injected changes—mutation analysis. We also investigate a new potential measure called checked code coverage that calculates coverage from the dynamic backward slices of test oracles, i.e. all statements that contribute to the checked result of any test. Adoption of these alternative approaches in automated classroom grading systems require overcoming a number of technical challenges. This research addresses these challenges and experimentally compares different methods in terms of how well they predict defect-detection capabilities of student-written tests when run against over 36,500 known, authentic, human-written errors. For data collection, we use CS2 assignments and evaluate students’ tests with 10 different measures—all-pairs execution, mutation testing with four different sets of mutation operators, checked code coverage, and four coverage criteria. Experimental results encompassing 1,971,073 test runs show that all-pairs execution is the most accurate predictor of the underlying defect-detection capability of a test suite. The second best predictor is mutation analysis with the statement deletion operator. Further, no strong correlation was found between defect-detection capability and coverage measures.	en
dc.description.degree	Ph. D.	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:5068	en
dc.identifier.uri	http://hdl.handle.net/10919/52024	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Software Testing	en
dc.subject	Automated Assessment	en
dc.subject	All-pairs Execution	en
dc.subject	Mutation Testing	en
dc.subject	Coverage Criteria	en
dc.subject	Defect-detection Capability	en
dc.title	Automated Assessment of Student-written Tests Based on Defect-detection Capability	en
dc.type	Dissertation	en
thesis.degree.discipline	Computer Science and Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Ph. D.	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Shams_Z_D_2015.pdf
Size:: 2.93 MB
Format:: Adobe Portable Document Format

Download

Collections

Doctoral Dissertations