NVIDIA, Arm, and Intel Collaborate To Push FP8 Format Standard For Deep Learning
NVIDIA, Arm, and Intel are working together to establish an 8-bit floating point number standard for machine learning.
Created on September 15|Last edited on September 15
Comment
Three industry-leading AI hardware manufacturing companies NVIDIA, Arm, and Intel have teamed up to start the push for 8-bit precision floating point number formats. In a whitepaper collectively authored by the three companies, the smaller format is explored for the benefits it could provide for more efficient deep learning.
The proposed FP8 format
Floating point numbers (floats) are most commonly seen in a 32-bit form, meaning each number takes up 32 bits of memory - that's around 268 million numbers per gigabyte of memory. Floats are used to represent the weights of a machine learning model, so when we have models sized at 175 billion parameters, that's around 5.2 terabytes of data for weights alone.
For a while now, 16-bit floats have been the standard in machine learning, meaning we can store weights in half the space while being able to send twice the data to GPUs or other processors. Now, NVIDIA, Arm, and Intel want to cut that number in half again and move towards a standardized use of 8-bit floats.
Floats are a complicated subject, but the idea is that decimal numbers are represented in a scientific notation format that computers can work with easily. Going from 32-bits to 8-bits cuts decimal number precision by a large margin, but for machine learning models, the precision loss is acceptable for the gains in storage space and processing speed, according to the joint authored study.
Two formats of FP8 are proposed, the first a E5M2 format that follows IEEE 754 standards, and the second being a E4M3 format that's more suited for ML by stripping out some special value bit patterns hard-coded into the IEEE 754 standard in favor of better decimal number representation.

Despite the loss in number precision, FP8 model performance was found to be effectively equivalent to that of models trained using FP16 standard on a variety of tasks. NVIDIA additionally showed off the success of FP8 in the recent MLPerf Inference v2.1 round with lightning-fast BERT model inference.
Find out more
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.