How Well Do Multimodal Large Language Models Score on Visual Personnel Assessments
| dc.contributor.author | Liu, Siyi | en |
| dc.contributor.committeechair | Hickman, Louis | en |
| dc.contributor.committeemember | Hernandez, Ivan | en |
| dc.contributor.committeemember | Hsu, Ning | en |
| dc.contributor.department | Psychology | en |
| dc.date.accessioned | 2026-03-04T13:48:11Z | en |
| dc.date.available | 2026-03-04T13:48:11Z | en |
| dc.date.issued | 2025-12-18 | en |
| dc.description.abstract | Multimodal Large Language Models (MLLMs) pose new threats to the validity of visual personnel assessments in high stakes selection contexts, as their emergent visual perception and understanding capabilities may facilitate applicant cheating. The study investigated the performance of three popular MLLMs on one visual cognitive ability test bundle and one visual forced-choice personality test, across three prompt approaches and three temperature settings. It was found that MLLMs only achieved median-level scores (the 50th percentile) on the visual cognitive ability test, not as competitive as high-performing human test takers. However, they exhibited top-tier performance (over the 98th percentile) on Conscientiousness on the visual personality test, while exhibiting high scores on Agreeableness and Emotional Stability by nudging temperatures or prompts. Given MLLMs’ potential to enable applicant cheating in unproctored pre-employment assessments, the study urged test vendors and employers to implement anti-cheating measures and offered related recommendations. | en |
| dc.description.abstractgeneral | Multimodal Large Language Models (MLLMs), such as ChatGPT-5 and Gemini-3, have exhibited strong visual perception, reasoning and understanding abilities in both research and industry reports. Job applicants might leverage MLLMs for cheating in the visual personnel assessments, which poses new threats to the validity of such assessments in high stakes selection contexts. To investigate the issue, the study examined the performance of three popular MLLMs on one visual cognitive ability test bundle and one visual forced-choice personality test, across three prompt approaches and three temperature settings. It was found that MLLMs only achieved median-level scores (the 50th percentile) on the visual cognitive ability test, not as competitive as human test takers. However, they exhibited top-tier performance (over the 98th percentile) on one Big Five trait Conscientiousness on the visual personality test, while scoring high on other traits including Agreeableness and Emotional Stability by nudging temperatures or prompts. Given MLLMs’ potential to enable applicant cheating in unproctored pre-employment assessments, the study urged test vendors and employers to implement anti-cheating measures and offered related recommendations. | en |
| dc.description.degree | Master of Science | en |
| dc.format.medium | ETD | en |
| dc.format.mimetype | application/pdf | en |
| dc.identifier.uri | https://hdl.handle.net/10919/141657 | en |
| dc.language.iso | en | en |
| dc.publisher | Virginia Tech | en |
| dc.rights | In Copyright | en |
| dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
| dc.subject | Multimodal large language models | en |
| dc.subject | chatbots | en |
| dc.subject | visual personnel assessments | en |
| dc.subject | personnel selection | en |
| dc.subject | hiring | en |
| dc.title | How Well Do Multimodal Large Language Models Score on Visual Personnel Assessments | en |
| dc.type | Thesis | en |
| dc.type.dcmitype | Text | en |
| thesis.degree.discipline | Psychology | en |
| thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
| thesis.degree.level | masters | en |
| thesis.degree.name | Master of Science | en |