VTechWorks staff will be away for the winter holidays until January 5, 2026, and will respond to requests at that time.
 

Beyond the Checkbox: Leveraging AI Chatbots for Inclusive Demographic Data Collection

TR Number

Date

2025-09-19

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Traditional demographic surveys compress rich identities into rigid checkboxes. This dissertation asks whether a conversational chatbot, powered by GPT-4o, can restore that nuance. In a within-subjects experiment, 230 participants completed both a chatbot conversation and the standard Office of Management and Budget (OMB) form. Exploratory analyses showed that participants' open-ended narratives frequently moved beyond the OMB labels. By encoding these responses with the INSTRUCTOR embedding model, and organizing them via hierarchical clustering, the categorization can be "cut" at multiple levels of granularity, producing solutions that can satisfy regulatory reporting and finer leaves that reveal national, regional, and mixed-heritage detail. Hypothesis-driven tests of user experience reinforced these advantages. On the User Experience Questionnaire, the chatbot outscored the demographic checklist on hedonic qualities, novelty, and stimulation, while the checklist retained pragmatic strengths such as dependability. Perceived group inclusivity also rose when data were collected through the chatbot, regardless of how closely respondents' identities aligned with OMB categories. Overall, the findings indicate that a carefully engineered chatbot, paired with advanced natural-language-processing analyses, can enhance race and ethnicity data collection by producing richer information and fostering a more inclusive, engaging respondent experience.

Description

Keywords

Chatbots, Demographic data collection, Race, Ethnicity, Natural language processing, Inclusivity

Citation