How Well Do Multimodal Large Language Models Score on Visual Personnel Assessments

Liu, Siyi

How Well Do Multimodal Large Language Models Score on Visual Personnel Assessments

dc.contributor.author	Liu, Siyi	en
dc.contributor.committeechair	Hickman, Louis	en
dc.contributor.committeemember	Hernandez, Ivan	en
dc.contributor.committeemember	Hsu, Ning	en
dc.contributor.department	Psychology	en
dc.date.accessioned	2026-03-04T13:48:11Z	en
dc.date.available	2026-03-04T13:48:11Z	en
dc.date.issued	2025-12-18	en
dc.description.abstract	Multimodal Large Language Models (MLLMs) pose new threats to the validity of visual personnel assessments in high stakes selection contexts, as their emergent visual perception and understanding capabilities may facilitate applicant cheating. The study investigated the performance of three popular MLLMs on one visual cognitive ability test bundle and one visual forced-choice personality test, across three prompt approaches and three temperature settings. It was found that MLLMs only achieved median-level scores (the 50th percentile) on the visual cognitive ability test, not as competitive as high-performing human test takers. However, they exhibited top-tier performance (over the 98th percentile) on Conscientiousness on the visual personality test, while exhibiting high scores on Agreeableness and Emotional Stability by nudging temperatures or prompts. Given MLLMs’ potential to enable applicant cheating in unproctored pre-employment assessments, the study urged test vendors and employers to implement anti-cheating measures and offered related recommendations.	en
dc.description.abstractgeneral	Multimodal Large Language Models (MLLMs), such as ChatGPT-5 and Gemini-3, have exhibited strong visual perception, reasoning and understanding abilities in both research and industry reports. Job applicants might leverage MLLMs for cheating in the visual personnel assessments, which poses new threats to the validity of such assessments in high stakes selection contexts. To investigate the issue, the study examined the performance of three popular MLLMs on one visual cognitive ability test bundle and one visual forced-choice personality test, across three prompt approaches and three temperature settings. It was found that MLLMs only achieved median-level scores (the 50th percentile) on the visual cognitive ability test, not as competitive as human test takers. However, they exhibited top-tier performance (over the 98th percentile) on one Big Five trait Conscientiousness on the visual personality test, while scoring high on other traits including Agreeableness and Emotional Stability by nudging temperatures or prompts. Given MLLMs’ potential to enable applicant cheating in unproctored pre-employment assessments, the study urged test vendors and employers to implement anti-cheating measures and offered related recommendations.	en
dc.description.degree	Master of Science	en
dc.format.medium	ETD	en
dc.format.mimetype	application/pdf	en
dc.identifier.uri	https://hdl.handle.net/10919/141657	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Multimodal large language models	en
dc.subject	chatbots	en
dc.subject	visual personnel assessments	en
dc.subject	personnel selection	en
dc.subject	hiring	en
dc.title	How Well Do Multimodal Large Language Models Score on Visual Personnel Assessments	en
dc.type	Thesis	en
dc.type.dcmitype	Text	en
thesis.degree.discipline	Psychology	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	masters	en
thesis.degree.name	Master of Science	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Liu_S_T_2025.pdf
Size:: 1.66 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.5 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Masters Theses