Hardware

Whitepaper: Current best practices for training LLMs from scratch

HARDWARE It should come as no surprise that pre-training LLMs is a hardware-intensive effort. The following examples of current models are a good guide here: • PaLM (540B, Google):6144 TPU v4 chips used in total, made of two TPU v4 Pods connected over data center network (DCN) using a combination of model and data parallelism. […]

The scaling laws

Whitepaper: Current best practices for training LLMs from scratch

THE SCALING LAWS Before you dive into training, it’s important to cover how LLMs scale. Understanding scaling lets you effectively balance the size and complexity of your model and the size of the data you’ll use to train it. Some relevant history here: OpenAI originally introduced “the LLM scaling laws” in 2020. They suggested that […]