Exploring Per-Input Filter Selection and Approximation Techniques for Deep Neural Networks
MetadataShow full item record
We propose a dynamic, input dependent filter approximation and selection technique to improve the computational efficiency of Deep Neural Networks. The approximation techniques convert 32 bit floating point representation of filter weights in neural networks into smaller precision values. This is done by reducing the number of bits used to represent the weights. In order to calculate the per-input error between the trained full precision filter weights and the approximated weights, a metric called Multiplication Error (ME) has been chosen. For convolutional layers, ME is calculated by subtracting the approximated filter weights from the original filter weights, convolving the difference with the input and calculating the grand-sum of the resulting matrix. For fully connected layers, ME is calculated by subtracting the approximated filter weights from the original filter weights, performing matrix multiplication between the difference and the input and calculating the grand-sum of the resulting matrix. ME is computed to identify approximated filters in a layer that result in low inference accuracy. In order to maintain the accuracy of the network, these filters weights are replaced with the original full precision weights. Prior work has primarily focused on input independent (static) replacement of filters to low precision weights. In this technique, all the filter weights in the network are replaced by approximated filter weights. This results in a decrease in inference accuracy. The decrease in accuracy is higher for more aggressive approximation techniques. Our proposed technique aims to achieve higher inference accuracy by not approximating filters that generate high ME. Using the proposed per-input filter selection technique, LeNet achieves an accuracy of 95.6% with 3.34% drop from the original accuracy value of 98.9% for truncating to 3 bits for the MNIST dataset. On the other hand upon static filter approximation, LeNet achieves an accuracy of 90.5% with 8.5% drop from the original accuracy. The aim of our research is to potentially use low precision weights in deep learning algorithms to achieve high classification accuracy with less computational overhead. We explore various filter approximation techniques and implement a per-input filter selection and approximation technique that selects the filters to approximate during run-time.
General Audience Abstract
Deep neural networks, just like the human brain can learn important information about the data provided to them and can classify a new input based on the labels corresponding to the provided dataset. Deep learning technology is heavily employed in devices using computer vision, image and video processing and voice detection. The computational overhead incurred in the classification process of DNNs prohibits their use in smaller devices. This research aims to improve network efficiency in deep learning by replacing 32 bit weights in neural networks with less precision weights in an input-dependent manner. Trained neural networks are numerically robust. Different layers develop tolerance to minor variations in network parameters. Therefore, differences induced by low-precision calculations fall well within tolerance limit of the network. However, for aggressive approximation techniques like truncating to 3 and 2 bits, inference accuracy drops severely. We propose a dynamic technique that during run-time, identifies the approximated filters resulting in low inference accuracy for a given input and replaces those filters with the original filters to achieve high inference accuracy. The proposed technique has been tested for image classification on Convolutional Neural Networks. The datasets used are MNIST and CIFAR-10. The Convolutional Neural Networks used are 4-layered CNN, LeNet-5 and AlexNet.
- Masters Theses