Binary Neural Networks
WINLAB Summer 2021 Research Internship
Overview
This project is part of the Rutgers WINLAB Summer 2021 Research Internship program, bringing together researchers, graduate students, and undergraduate students. Traditionally, power consumption has been an oft overlooked metric in the training and execution of neural networks, but the paradigm is beginning to shift as large computing systems used in deep learning continue to increase in scale [1]. It is a well-established fact that integer units take up far less physical chip space than floating point units [2][3], and their power consumption is far less as a result. Through our research at WINLAB, we investigated the use of integers and binary fixed-point number implementations in two neural networks trained on the MNIST-digits and MNIST-fashion datasets to see what effect their use might have on accuracy and training time.
Objectives
energy
Integer processing units use up to 60% less physical chip space on a CPU than float processing units. Since fixed-point numbers can be created using integers and bit-wise operations, fixed-point unlocks this potential of greater energy efficiency.
Establishing and maintaining comparable accuracy while transitioning to using fixed-point numbers is vital before this method can be implemented at scale.
accuracy
Our fixed-point implementations of neural networks should require the same training time, allowing for power-saving benefits with next to no draw backs.
​
speed
Weekly Timeline
week 1 and week 2
-
Introductions
-
Read existing literature surrounding non-floating point implementations of neural networks
-
Familiarized ourselves with Golang
-
Gathered accuracy data using previous years' floating-point neural networks
-
Continued implementation of fixed-point linear algebra framework
week 3
-
Implemented code to track weight maxima and minima for each training iteration
-
Began implementing ReLU activation function instead of Sigmoid to move away from floating-points
-
Made plots of absolute maxima and minima of the weights in both weight matrices in our neural network to investigate dynamic range
The Figures above show that the dynamic range (minimums and maximums of the individual weights in the neural network) increases with time, but not very rapidly and not by much. This range remains very small, around [-10, 10] should be plenty for accuracy, which bodes well for finite-precision fixed-point numbers.
week 4
-
Began checking intermittent accuracy for GONN model using validation set
- Plotted absolute maxima and minima of the weights in the weight matrices for each epoch along side accuracy to determine if there was any connection between dynamic range and accuracy
- Found accuracy plateau at relatively small dynamic range, which bodes well for fixed-point implementation
Because reasonably sized fixed-points are small, anywhere from 32 to 128 bits depending on the specific implementation, this plateau means that it is possible to maintain model accuracy even with finite precision
week 5
-
Added layers to network to test dynamic range under different circumstances
-
Created range and accuracy vs. training iteration graphs as we prototyped different fixed-point networks and ReLU functions
-
Created GitHub repository to bring team onto one code base
-
Encountered an error caused by overflow when using the Sigmoid function with our fixed-point implementation, which resulted in steep drop-offs in accuracy
week 6
Broken Sigmoid
Nearly Fixed Sigmoid
Fixed Sigmoid
We fixed the Sigmoid function by adding code to conditionally deal with different edge cases, including overflow
-
Analyzed upper and lower bounds for weight precision by observing the precision needed to maintain relative order of the output scores
-
Continued testing of various weight cutoffs
-
Continued working on fixed-point ReLU implementation
-
Began exploring other, more complicated datasets to test our network on. Finally decided on MNIST Fashion
week 7
-
Completed and implemented fixed-point neural network MNIST fashion datasets
-
Tested ReLU activation function, found it was not accurate and we must add another layer to the network in order to fix that as well as look for potential bugs
-
Experimented with truncating bits from floating-point numbers to see how much precision is really needed to maintain the accuracy of our networks
week 8
-
Collected accuracy data for fixed-point models over the MNIST Digit and MNIST Fashion datasets
-
Compared fixed-point network's accuracy to floating-point network's accuracy
-
Started implementation of a Softmax layer, which would test the extremes of our fixed-point implementation due to large returns from exp() method
Graph starts after 100 data points of training. This once again shows that improvement in accuracy do not necessarily require a larger dynamic range, even for more complicated problems than MNIST
week 9
-
Continued implementation of Softmax layer
-
Tested other datasets to test effectiveness and observe the different dynamic ranges that different datasets produce
Internship end
The current trajectory of machine learning research has produced a level of abstraction in which machine learning engineers need not necessarily concern themselves with low-level, hardware details. Our investigation into using fixed-point numbers in neural networks shows promise, and in doing so could bridge the existing gap and bring nuanced hardware implementations into an up-to-date software stack. Despite an unstable dynamic weight range, one that continues to diverge with time, our preliminary results indicate that the range will remain small enough to allow using reasonably small-sized fixed-point numbers, without significant loss in accuracy. These results show that fixed-points are viable, with their power-saving and potentially time-saving benefits. This topic is worth further investigation, especially in experimenting with different kinds of neural network architectures like convolutional neural networks and recurrent neural networks.