The effect of denormalized schemas on ad-hoc query formulation: a human factors experiment in database design

Virginia Polytechnic Institute and State University

The information systems literature is rich with studies of database organization and its impact on machine, programmer, and administrative efficiency. Little attention, however, has been paid to the impact of database organization on end-user interactions with computer systems. This research effort addressed this increasingly important issue by examining the effects of database organization on the ability of end-users to locate and extract desired information.

The study examined the impact of normalization levels of external relational database schema on end-user query success. It has been suggested in the literature that end-user query success might be improved by presenting external schema in lower level normal forms. This speculation is based on an analytical study of one particular class of query, queries involving join operations. The research presented here provides empirical support for this assertion. However, the implicit assumption that all other queries are neutral in their bias toward a particular level of normalization was found to be false. A class of queries requiring decomposition of prejoined relations was identified which strongly biases normalized relations. Thus, no particular normalization level was shown to dominate unless assumptions were made as to the class of query being formulated. Evidence from field research may be required to completely resolve the issue.

The study also examined the interaction effects between normalization levels and other key variables known to impact query success. Significant interactions with user skill and the complexity of the query being made were found. The level of normalization did not impact high skilled users making easy queries or low skilled users making difficult queries. The impact of these interactions, as well as the main effects of the related variables, on query syntax and logic errors holds important implications for database administrators as well as those involved with the development of database query languages.