VTechWorks staff will be away for the Thanksgiving holiday from Wednesday November 26 through Sunday November 30. We will respond to emails on Monday December 1.
 

Classifying Diverse Manual Material Handling Tasks Using Vision Transformers and Recurrent Neural Networks

Files

TR Number

Date

2025-09

Journal Title

Journal ISSN

Volume Title

Publisher

SAGE Publications

Abstract

Frequent or prolonged manual material handling (MMH) is a major risk factor for work-related musculoskeletal disorders, which cause considerable health and economic burdens. Assessing physical exposures is essential for identifying high-risk tasks and implementing targeted ergonomic interventions. However, variability in MMH task performance across individuals and work settings complicates physical exposure assessments. Further, conventional tools often suffer from limitations such as bias, discomfort, behavioral interference, and high costs. Noncontact (ambient) methods and automated data collection and analysis present promising alternatives for assessing physical exposure. We investigated the use of vision transformers and recurrent neural networks for non-contact MMH task classification using RGB video for eight simulated MMH tasks. Spatial features were extracted using the Contrastive Language-Image Pre-training vision transformer, then classified by a Bidirectional Long Short-Term Memory model to capture temporal dependencies between video frames. Our model achieved a mean accuracy of 88% in classifying MMH tasks, demonstrating comparable performance to methods using depth cameras or wearable sensors, while potentially offering better scalability and feasibility for real environments. Future work includes improving temporal modeling, integrating task-adapted feature extraction, and validating across more diverse workers and occupational environments.

Description

Keywords

physical exposure assessment, musculoskeletal disorders, vision-language model (VLM), generative pretrained transformer (GPT), computer vision

Citation