Binary Neural Networks

WINLAB Summer 2021 Research Internship

Overview

This project is part of the Rutgers WINLAB Summer 2021 Research Internship program, bringing together researchers, graduate students, and undergraduate students. Traditionally, power consumption has been an oft overlooked metric in the training and execution of neural networks, but the paradigm is beginning to shift as large computing systems used in deep learning continue to increase in scale [1]. It is a well-established fact that integer units take up far less physical chip space than floating point units [2][3], and their power consumption is far less as a result. Through our research at WINLAB, we investigated the use of integers and binary fixed-point number implementations in two neural networks trained on the MNIST-digits and MNIST-fashion datasets to see what effect their use might have on accuracy and training time.

Objectives

energy

Integer processing units use up to 60% less physical chip space on a CPU than float processing units. Since fixed-point numbers can be created using integers and bit-wise operations, fixed-point unlocks this potential of greater energy efficiency.

Establishing and maintaining comparable accuracy while transitioning to using fixed-point numbers is vital before this method can be implemented at scale.

accuracy

Our fixed-point implementations of neural networks should require the same training time, allowing for power-saving benefits with next to no draw backs.

speed

Weekly Timeline

week 1 and week 2

Introductions
Read existing literature surrounding non-floating point implementations of neural networks
Familiarized ourselves with Golang
Gathered accuracy data using previous years' floating-point neural networks
Continued implementation of fixed-point linear algebra framework

week 3

Implemented code to track weight maxima and minima for each training iteration
Began implementing ReLU activation function instead of Sigmoid to move away from floating-points
Made plots of absolute maxima and minima of the weights in both weight matrices in our neural network to investigate dynamic range

The Figures above show that the dynamic range (minimums and maximums of the individual weights in the neural network) increases with time, but not very rapidly and not by much. This range remains very small, around [-10, 10] should be plenty for accuracy, which bodes well for finite-precision fixed-point numbers.

week 4

Began checking intermittent accuracy for GONN model using validation set
Plotted absolute maxima and minima of the weights in the weight matrices for each epoch along side accuracy to determine if there was any connection between dynamic range and accuracy
Found accuracy plateau at relatively small dynamic range, which bodes well for fixed-point implementation

Hidden Range, Output Range, Accuracy vs. Epoch(1).png

Because reasonably sized fixed-points are small, anywhere from 32 to 128 bits depending on the specific implementation, this plateau means that it is possible to maintain model accuracy even with finite precision

week 5

Added layers to network to test dynamic range under different circumstances
Created range and accuracy vs. training iteration graphs as we prototyped different fixed-point networks and ReLU functions
Created GitHub repository to bring team onto one code base
Encountered an error caused by overflow when using the Sigmoid function with our fixed-point implementation, which resulted in steep drop-offs in accuracy

week 6

Broken Sigmoid

Fixed Point Ranges and Accuracy Adjusted ReLU 1.png

Nearly Fixed Sigmoid

Fixed Point Ranges and Accuracy ReLU Works.png

Fixed Sigmoid

We fixed the Sigmoid function by adding code to conditionally deal with different edge cases, including overflow

Analyzed upper and lower bounds for weight precision by observing the precision needed to maintain relative order of the output scores
Continued testing of various weight cutoffs
Continued working on fixed-point ReLU implementation
Began exploring other, more complicated datasets to test our network on. Finally decided on MNIST Fashion

week 7

Completed and implemented fixed-point neural network MNIST fashion datasets
Tested ReLU activation function, found it was not accurate and we must add another layer to the network in order to fix that as well as look for potential bugs
Experimented with truncating bits from floating-point numbers to see how much precision is really needed to maintain the accuracy of our networks

MNIST Fashion

week 8

Collected accuracy data for fixed-point models over the MNIST Digit and MNIST Fashion datasets
Compared fixed-point network's accuracy to floating-point network's accuracy
Started implementation of a Softmax layer, which would test the extremes of our fixed-point implementation due to large returns from exp() method

Fixed Point Ranges and Accuracy (MNIST Fashion)(1).png

Graph starts after 100 data points of training. This once again shows that improvement in accuracy do not necessarily require a larger dynamic range, even for more complicated problems than MNIST

week 9

Continued implementation of Softmax layer
Tested other datasets to test effectiveness and observe the different dynamic ranges that different datasets produce

Internship end

The current trajectory of machine learning research has produced a level of abstraction in which machine learning engineers need not necessarily concern themselves with low-level, hardware details. Our investigation into using fixed-point numbers in neural networks shows promise, and in doing so could bridge the existing gap and bring nuanced hardware implementations into an up-to-date software stack. Despite an unstable dynamic weight range, one that continues to diverge with time, our preliminary results indicate that the range will remain small enough to allow using reasonably small-sized fixed-point numbers, without significant loss in accuracy. These results show that fixed-points are viable, with their power-saving and potentially time-saving benefits. This topic is worth further investigation, especially in experimenting with different kinds of neural network architectures like convolutional neural networks and recurrent neural networks.

This graph plots accuracy of the neural network vs. bits truncated from the floating point number. Reading from Right to Left, the accuracy of the model remains very high up until we truncated about 35 bits from the mantissa. This is a very good result, because it means that very high precision is not necessary for accurate neural networks. Once again, this is evidence that a limited precision fixed-point implementation can work

Weekly Progress

Our Team

Daniel Maevsky

Undergraduate Sophomore

Honors Academy student at Rutgers University - New Brunswick School of Engineering

Studying Electrical and Computer Engineering, Computer Science, and Math

Sachin Mathew

Undergraduate Senior

Honors Academy student at Rutgers University - New Brunswick School of Engineering

Studying Electrical and Computer Engineering and Computer Science

Daniel Chen

Undergraduate Sophomore

Honors Academy student at Rutgers University - New Brunswick School of Engineering

Studying Electrical and Computer Engineering

Thomas Forzani

Undergraduate Sophomore

Honors Academy student at Rutgers University - New Brunswick School of Engineering

Studying Electrical and Computer Engineering and Computer Science

Serena Zhang

High School Junior

Student at Basis Independent Silicon Valley

Interested in studying Computer Science

Richard Martin

Project Mentor

Associate Professor at Rutgers University

Department of Computer Science

Years based on 2021 Internship

Click on an image to go to

Team

References

ORBIT Lab Site

Estimation of Energy Consumption in Machine Learning

Intel and Floating-Point

Low Power Sub-Threshold Asynchronous QDI Static Logic

Last Year's Project

References

Binary Neural Networks

Overview

Objectives

energy

Integer processing units use up to 60% less physical chip space on a CPU than float processing units. Since fixed-point numbers can be created using integers and bit-wise operations, fixed-point unlocks this potential of greater energy efficiency.

Establishing and maintaining comparable accuracy while transitioning to using fixed-point numbers is vital before this method can be implemented at scale.

accuracy

Our fixed-point implementations of neural networks should require the same training time, allowing for power-saving benefits with next to no draw backs.

​

speed

Weekly Timeline

week 1 and week 2

Introductions

Read existing literature surrounding non-floating point implementations of neural networks

Familiarized ourselves with Golang

Gathered accuracy data using previous years' floating-point neural networks

Continued implementation of fixed-point linear algebra framework

week 3

Implemented code to track weight maxima and minima for each training iteration

Began implementing ReLU activation function instead of Sigmoid to move away from floating-points

Made plots of absolute maxima and minima of the weights in both weight matrices in our neural network to investigate dynamic range

week 4

Began checking intermittent accuracy for GONN model using validation set

Because reasonably sized fixed-points are small, anywhere from 32 to 128 bits depending on the specific implementation, this plateau means that it is possible to maintain model accuracy even with finite precision

week 5

Added layers to network to test dynamic range under different circumstances

Created range and accuracy vs. training iteration graphs as we prototyped different fixed-point networks and ReLU functions

Created GitHub repository to bring team onto one code base

Encountered an error caused by overflow when using the Sigmoid function with our fixed-point implementation, which resulted in steep drop-offs in accuracy

week 6

Broken Sigmoid

Nearly Fixed Sigmoid

Fixed Sigmoid

We fixed the Sigmoid function by adding code to conditionally deal with different edge cases, including overflow

Analyzed upper and lower bounds for weight precision by observing the precision needed to maintain relative order of the output scores

Continued testing of various weight cutoffs

Continued working on fixed-point ReLU implementation

Began exploring other, more complicated datasets to test our network on. Finally decided on MNIST Fashion

week 7

Completed and implemented fixed-point neural network MNIST fashion datasets

Tested ReLU activation function, found it was not accurate and we must add another layer to the network in order to fix that as well as look for potential bugs

Experimented with truncating bits from floating-point numbers to see how much precision is really needed to maintain the accuracy of our networks

week 8

Collected accuracy data for fixed-point models over the MNIST Digit and MNIST Fashion datasets

Compared fixed-point network's accuracy to floating-point network's accuracy

Started implementation of a Softmax layer, which would test the extremes of our fixed-point implementation due to large returns from exp() method

Graph starts after 100 data points of training. This once again shows that improvement in accuracy do not necessarily require a larger dynamic range, even for more complicated problems than MNIST

week 9

Continued implementation of Softmax layer

Tested other datasets to test effectiveness and observe the different dynamic ranges that different datasets produce

Internship end

Our Team

Daniel Maevsky

Undergraduate Sophomore

Honors Academy student at Rutgers University - New Brunswick School of Engineering Studying Electrical and Computer Engineering, Computer Science, and Math

Sachin Mathew

Undergraduate Senior

Honors Academy student at Rutgers University - New Brunswick School of Engineering Studying Electrical and Computer Engineering and Computer Science

Daniel Chen

Undergraduate Sophomore

Honors Academy student at Rutgers University - New Brunswick School of Engineering Studying Electrical and Computer Engineering

Thomas Forzani

Undergraduate Sophomore

Honors Academy student at Rutgers University - New Brunswick School of Engineering Studying Electrical and Computer Engineering and Computer Science

Serena Zhang

High School Junior

Student at Basis Independent Silicon Valley Interested in studying Computer Science

Richard Martin

Project Mentor

Associate Professor at Rutgers University Department of Computer Science

Years based on 2021 Internship

Click on an image to go to

References

CONTACT US

671, US-1 North Brunswick Township NJ, 08902

Honors Academy student at Rutgers University - New Brunswick School of Engineering

Studying Electrical and Computer Engineering, Computer Science, and Math

Honors Academy student at Rutgers University - New Brunswick School of Engineering

Studying Electrical and Computer Engineering and Computer Science

Honors Academy student at Rutgers University - New Brunswick School of Engineering

Studying Electrical and Computer Engineering

Honors Academy student at Rutgers University - New Brunswick School of Engineering

Studying Electrical and Computer Engineering and Computer Science

Student at Basis Independent Silicon Valley

Interested in studying Computer Science

Associate Professor at Rutgers University

Department of Computer Science

671, US-1
North Brunswick Township
NJ, 08902