Classifying Diverse Manual Material Handling Tasks Using Vision Transformers and Recurrent Neural Networks
Files
TR Number
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Frequent or prolonged manual material handling (MMH) is a major risk factor for work-related musculoskeletal disorders, which cause considerable health and economic burdens. Assessing physical exposures is essential for identifying high-risk tasks and implementing targeted ergonomic interventions. However, variability in MMH task performance across individuals and work settings complicates physical exposure assessments. Further, conventional tools often suffer from limitations such as bias, discomfort, behavioral interference, and high costs. Noncontact (ambient) methods and automated data collection and analysis present promising alternatives for assessing physical exposure. We investigated the use of vision transformers and recurrent neural networks for non-contact MMH task classification using RGB video for eight simulated MMH tasks. Spatial features were extracted using the Contrastive Language-Image Pre-training vision transformer, then classified by a Bidirectional Long Short-Term Memory model to capture temporal dependencies between video frames. Our model achieved a mean accuracy of 88% in classifying MMH tasks, demonstrating comparable performance to methods using depth cameras or wearable sensors, while potentially offering better scalability and feasibility for real environments. Future work includes improving temporal modeling, integrating task-adapted feature extraction, and validating across more diverse workers and occupational environments.