Automated Vision-Based Tracking and Action Recognition of Earthmoving Construction Operations

TR Number



Journal Title

Journal ISSN

Volume Title


Virginia Tech


The current practice of construction productivity and emission monitoring is performed by either manual stopwatch studies which are significantly labor intensive and subject to human errors, or by the use of RFID and GPS tracking devices which may be costly and impractical. To address these limitations, a novel computer vision based method for automated 2D tracking, 3D localization, and action recognition of construction equipment from different camera viewpoints is presented. In the proposed method, a new algorithm based on Histograms of Oriented Gradients and hue-saturation Colors (HOG+C) is used for 2D tracking of the earthmoving equipment. Once the equipment is detected, using a Direct Linear Transformation followed by a non-linear optimization, their positions are localized in 3D. In order to automatically analyze the performance of these operations, a new algorithm to recognize actions of the equipment is developed. First, a video is represented as a collection of spatio-temporal features by extracting space-time interest points and describing each with a Histogram of Oriented Gradients (HOG). The algorithm automatically learns the distributions of these features by clustering their HOG descriptors. Equipment action categories are then learned using a multi-class binary Support Vector Machine (SVM) classifier. Given a novel video sequence, the proposed method recognizes and localizes equipment actions. The proposed method has been exhaustively tested on 859 videos from earthmoving operations. Experimental results with an average accuracy of 86.33% and 98.33% for excavator and truck action recognition respectively, reflect the promise of the proposed method for automated performance monitoring.



Support Vector Machine, Histogram of Gradients, Action Recognition, 2D Tracking, Construction Performance Monitoring