Neural Network Evaluation Parallelization

Neural networks work by activating neurons in the prior row based on the linear combination of the prior row and weights determined by the network's training.

This is equivalent to the matrix operations:

Matrix operations are easily parallelized. Below is a table showing the number of operations needed to multiply an NxN matrix by a vector of length N, the number of cycles needed if optimally parallelized, and the number of cores to achieve optimal parallelization.

N	Operations	Parallelized Cycles	Cores Required
2	6	2	4
4	28	3	16
8	120	4	64
16	496	5	256
32	2,016	6	1,024
64	8,128	7	4,096
128	32,640	8	16,384

Commercial CPUs top out at around 32 cores (AMD Ryzen Threadripper 3970X), so for anything more complex than 5x5 matrices, we need to use GPUs, which have more cores.

GPUs

GPUs have many more cores than any CPU. Although they cannot handle processes as complex as CPUs, they are great at performing simple addition and multiplication very quickly.

GPUs also have the advantage of specially designed circuits to do addition and multiplication in a single clock cycle (GPUs can take several). They can also have circuits specifically designed to do matrix multiplication, able to multiply two 4x4 matrices in a single clock cycle.

For more information on GPUs, see Cornell, and NVIDIA.

On the computer used for training, we are using an RTX A4000. Its full datasheet can be found here.

ONNX

ONNX

ONNX (The Open Neural Network Exchange) is an AI ecosystem that created standards for representing machine learning algorithms. These standards allow ONNX machine learning algorithms to be run on a wide variety of machines (unfortunately excluding Apple).

ONNX supports a wide variety of frameworks, including Matlab, Keras, TensorFlow, and PyTorch. For a full list and detailed instructions on converting to and from ONNX models, see this page: https://onnx.ai/supported-tools.html

Instructions for converting a TensorFlow model to ONNX:

Install ONNX convertor:
pip install git+https://github.com/onnx/tensorflow-onnx
To convert a model from TensorFlow to ONNX, run the following.
python -m tf2onnx.convert --saved-model tensorflow-model-path --output model.onnx
The model is now ready to be run on ONNX Runtime.

ONNX Runtime

ONNX does not itself perform inference. To do that, you need to install a runtime that can run ONNX. Use the following website for installation and operation instructions: https://onnxruntime.ai/index.html#getStartedTable

Using this, you can perform inference up to 17x faster. ONNX Runtime also allows you to accelerate PyTorch training (only PyTorch so far) up to 40% faster.

Space shortcuts

Page tree

Neural Network Evaluation Parallelization

GPUs

ONNX

Space shortcuts

Page tree

GPU and Neural Network Acceleration

Neural Network Evaluation Parallelization

GPUs

ONNX