How Well Do Multimodal Large Language Models Score on Visual Personnel Assessments

dc.contributor.authorLiu, Siyien
dc.contributor.committeechairHickman, Louisen
dc.contributor.committeememberHernandez, Ivanen
dc.contributor.committeememberHsu, Ningen
dc.contributor.departmentPsychologyen
dc.date.accessioned2026-03-04T13:48:11Zen
dc.date.available2026-03-04T13:48:11Zen
dc.date.issued2025-12-18en
dc.description.abstractMultimodal Large Language Models (MLLMs) pose new threats to the validity of visual personnel assessments in high stakes selection contexts, as their emergent visual perception and understanding capabilities may facilitate applicant cheating. The study investigated the performance of three popular MLLMs on one visual cognitive ability test bundle and one visual forced-choice personality test, across three prompt approaches and three temperature settings. It was found that MLLMs only achieved median-level scores (the 50th percentile) on the visual cognitive ability test, not as competitive as high-performing human test takers. However, they exhibited top-tier performance (over the 98th percentile) on Conscientiousness on the visual personality test, while exhibiting high scores on Agreeableness and Emotional Stability by nudging temperatures or prompts. Given MLLMs’ potential to enable applicant cheating in unproctored pre-employment assessments, the study urged test vendors and employers to implement anti-cheating measures and offered related recommendations.en
dc.description.abstractgeneralMultimodal Large Language Models (MLLMs), such as ChatGPT-5 and Gemini-3, have exhibited strong visual perception, reasoning and understanding abilities in both research and industry reports. Job applicants might leverage MLLMs for cheating in the visual personnel assessments, which poses new threats to the validity of such assessments in high stakes selection contexts. To investigate the issue, the study examined the performance of three popular MLLMs on one visual cognitive ability test bundle and one visual forced-choice personality test, across three prompt approaches and three temperature settings. It was found that MLLMs only achieved median-level scores (the 50th percentile) on the visual cognitive ability test, not as competitive as human test takers. However, they exhibited top-tier performance (over the 98th percentile) on one Big Five trait Conscientiousness on the visual personality test, while scoring high on other traits including Agreeableness and Emotional Stability by nudging temperatures or prompts. Given MLLMs’ potential to enable applicant cheating in unproctored pre-employment assessments, the study urged test vendors and employers to implement anti-cheating measures and offered related recommendations.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttps://hdl.handle.net/10919/141657en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectMultimodal large language modelsen
dc.subjectchatbotsen
dc.subjectvisual personnel assessmentsen
dc.subjectpersonnel selectionen
dc.subjecthiringen
dc.titleHow Well Do Multimodal Large Language Models Score on Visual Personnel Assessmentsen
dc.typeThesisen
dc.type.dcmitypeTexten
thesis.degree.disciplinePsychologyen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Name:
Liu_S_T_2025.pdf
Size:
1.66 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections