Vision-language models for occupational physical exposure assessment: Classification and temporal segmentation of manual material handling tasks
| dc.contributor.author | Rajabi, Mohammad Sadra | en |
| dc.contributor.author | Ojelade, Aanuoluwapo | en |
| dc.contributor.author | Kim, Sunwook | en |
| dc.contributor.author | Nussbaum, Maury A. | en |
| dc.date.accessioned | 2026-06-09T13:19:25Z | en |
| dc.date.available | 2026-06-09T13:19:25Z | en |
| dc.date.issued | 2026-06 | en |
| dc.description.abstract | Effective physical exposure assessment for manual materials handling (MMH) is essential for identifying activities that increase the risk of work-related musculoskeletal disorders and for guiding ergonomic interventions. However, existing methods are labor-intensive and often fail to capture task variability or to effectively estimate task timing characteristics. We evaluated the use of vision-language models (VLMs) to automatically and non-invasively classify eight MMH tasks and specific task conditions (i.e., hand configuration and lifting origin), and to detect task start and end times, using regular RGB video streams. We obtained task classification accuracies of ∼82-85%, accuracies for classifying lifting origin of ∼94-98%, and mean absolute start and end time errors <0.5 s, superior to prior work in some cases. Classification performance for hand configuration, though, was more variable. These findings demonstrate the potential of VLMs as a practical and scalable tool for physical exposure assessment of MMH tasks. | en |
| dc.description.version | Accepted version | en |
| dc.format.mimetype | application/pdf | en |
| dc.identifier | 104831 (Article number) | en |
| dc.identifier.doi | https://doi.org/10.1016/j.apergo.2026.104831 | en |
| dc.identifier.eissn | 1872-9126 | en |
| dc.identifier.issn | 0003-6870 | en |
| dc.identifier.orcid | Nussbaum, Maury [0000-0002-1887-8431] | en |
| dc.identifier.orcid | Kim, Sun Wook [0000-0003-3624-1781] | en |
| dc.identifier.other | S0003-6870(26)00109-2 (PII) | en |
| dc.identifier.pmid | 42247867 | en |
| dc.identifier.uri | https://hdl.handle.net/10919/143324 | en |
| dc.identifier.volume | 138 | en |
| dc.language.iso | en | en |
| dc.publisher | Elsevier | en |
| dc.relation.uri | https://www.ncbi.nlm.nih.gov/pubmed/42247867 | en |
| dc.rights | In Copyright | en |
| dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
| dc.subject | Generative pretrained transformer (GPT) | en |
| dc.subject | Human activity recognition (HAR) | en |
| dc.subject | Physical exposure assessment | en |
| dc.subject | Temporal action segmentation | en |
| dc.subject | Vision-language model (VLM) | en |
| dc.title | Vision-language models for occupational physical exposure assessment: Classification and temporal segmentation of manual material handling tasks | en |
| dc.title.serial | Applied Ergonomics | en |
| dc.type | Article - Refereed | en |
| dc.type.dcmitype | Text | en |
| dc.type.other | Journal Article | en |
| dcterms.dateAccepted | 2026-06-01 | en |
| pubs.organisational-group | Virginia Tech | en |
| pubs.organisational-group | Virginia Tech/Engineering | en |
| pubs.organisational-group | Virginia Tech/Engineering/Industrial and Systems Engineering | en |
| pubs.organisational-group | Virginia Tech/Faculty of Health Sciences | en |
| pubs.organisational-group | Virginia Tech/All T&R Faculty | en |
| pubs.organisational-group | Virginia Tech/Engineering/COE T&R Faculty | en |