Vision-language models for occupational physical exposure assessment: Classification and temporal segmentation of manual material handling tasks

dc.contributor.authorRajabi, Mohammad Sadraen
dc.contributor.authorOjelade, Aanuoluwapoen
dc.contributor.authorKim, Sunwooken
dc.contributor.authorNussbaum, Maury A.en
dc.date.accessioned2026-06-09T13:19:25Zen
dc.date.available2026-06-09T13:19:25Zen
dc.date.issued2026-06en
dc.description.abstractEffective physical exposure assessment for manual materials handling (MMH) is essential for identifying activities that increase the risk of work-related musculoskeletal disorders and for guiding ergonomic interventions. However, existing methods are labor-intensive and often fail to capture task variability or to effectively estimate task timing characteristics. We evaluated the use of vision-language models (VLMs) to automatically and non-invasively classify eight MMH tasks and specific task conditions (i.e., hand configuration and lifting origin), and to detect task start and end times, using regular RGB video streams. We obtained task classification accuracies of ∼82-85%, accuracies for classifying lifting origin of ∼94-98%, and mean absolute start and end time errors <0.5 s, superior to prior work in some cases. Classification performance for hand configuration, though, was more variable. These findings demonstrate the potential of VLMs as a practical and scalable tool for physical exposure assessment of MMH tasks.en
dc.description.versionAccepted versionen
dc.format.mimetypeapplication/pdfen
dc.identifier104831 (Article number)en
dc.identifier.doihttps://doi.org/10.1016/j.apergo.2026.104831en
dc.identifier.eissn1872-9126en
dc.identifier.issn0003-6870en
dc.identifier.orcidNussbaum, Maury [0000-0002-1887-8431]en
dc.identifier.orcidKim, Sun Wook [0000-0003-3624-1781]en
dc.identifier.otherS0003-6870(26)00109-2 (PII)en
dc.identifier.pmid42247867en
dc.identifier.urihttps://hdl.handle.net/10919/143324en
dc.identifier.volume138en
dc.language.isoenen
dc.publisherElsevieren
dc.relation.urihttps://www.ncbi.nlm.nih.gov/pubmed/42247867en
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectGenerative pretrained transformer (GPT)en
dc.subjectHuman activity recognition (HAR)en
dc.subjectPhysical exposure assessmenten
dc.subjectTemporal action segmentationen
dc.subjectVision-language model (VLM)en
dc.titleVision-language models for occupational physical exposure assessment: Classification and temporal segmentation of manual material handling tasksen
dc.title.serialApplied Ergonomicsen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten
dc.type.otherJournal Articleen
dcterms.dateAccepted2026-06-01en
pubs.organisational-groupVirginia Techen
pubs.organisational-groupVirginia Tech/Engineeringen
pubs.organisational-groupVirginia Tech/Engineering/Industrial and Systems Engineeringen
pubs.organisational-groupVirginia Tech/Faculty of Health Sciencesen
pubs.organisational-groupVirginia Tech/All T&R Facultyen
pubs.organisational-groupVirginia Tech/Engineering/COE T&R Facultyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Rajabi_Deposit.pdf
Size:
1.37 MB
Format:
Adobe Portable Document Format
Description:
Accepted version
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Plain Text
Description: