How Well Do Multimodal Large Language Models Score on Visual Personnel Assessments
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Multimodal Large Language Models (MLLMs) pose new threats to the validity of visual personnel assessments in high stakes selection contexts, as their emergent visual perception and understanding capabilities may facilitate applicant cheating. The study investigated the performance of three popular MLLMs on one visual cognitive ability test bundle and one visual forced-choice personality test, across three prompt approaches and three temperature settings. It was found that MLLMs only achieved median-level scores (the 50th percentile) on the visual cognitive ability test, not as competitive as high-performing human test takers. However, they exhibited top-tier performance (over the 98th percentile) on Conscientiousness on the visual personality test, while exhibiting high scores on Agreeableness and Emotional Stability by nudging temperatures or prompts. Given MLLMs’ potential to enable applicant cheating in unproctored pre-employment assessments, the study urged test vendors and employers to implement anti-cheating measures and offered related recommendations.