Appendix

Appendix Large pre-trained transformer language models, or simply large language models (LLM), are a recent breakthrough in machine learning that have vastly extended our capabilities in natural language processing (NLP). Based on transformer architectures, with as many as hundreds of billions of parameters, and trained on hundreds of terabytes of textual data, recent LLMs such […]
Reference

References What Language Model Architecture and Pre-training Objective Work Best for Zero-Shot Generalization? GPT-3 Paper – Language Models are Few-Shot Learners GPT-NeoX-20B: An Open-Source Autoregressive Language Model OPT: Open Pre-trained Transformer Language Models PaLM: Scaling Language Modeling with Pathways Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM How To Build an Efficient NLP […]
Conclusion

Current best practices for training LLMs from scratch Current best practices for training LLMs from scratch Conclusion Whether it’s OpenAI, Cohere, or open-source projects like EleutherAI, cutting-edge large language models are built on Weight & Biases. Our platform enables collaboration across teams performing the complex, expensive work required to train and push these models to […]
RLHF

REINFORCEMENT LEARNING THROUGH HUMAN FEEDBACK (RLHF) RLHFRLHF (Reinforcement Learning with Human Feedback) extends instruction tuning by incorporating human feedback after the instruction tuning step to improve model alignment with user expectations. Pre-trained LLMs often exhibit unintended behaviors, such as fabricating facts, generating biased or toxic responses, or failing to follow instructions due to the misalignment […]
Instruction tuning

INSTRUCTION TUNING At this point, let’s assume we have a pre-trained, general-purpose LLM. If we did our job well, our model can already be used for domain-specific tasks without tuning for few-shot learning and zero-shot learning scenarios. That said, zero-shot learning is in general much worse than its few-shot counterpart in plenty of tasks like […]
Bias and toxicity

BIAS AND TOXICITY There are potential risks associated with large-scale, general-purpose language models trained on web text. Which is to say: humans have biases, those biases make their way into data, and models that learn from that data can inherit those biases. In addition to perpetuating or exacerbating social stereotypes, you want to ensure your […]
Model evaluation

MODEL EVALUATION Typically, pre-trained models are evaluated on diverse language model datasets to assess their ability to perform logical reasoning, translation, natural language inference, question answering, and more. Machine learning practitioners have coalesced around a variety of standard evaluation benchmarks. A few popular examples include: Open-Domain Question Answering tasks: TriviaQA, Natural Questions, Web Questions Natural […]
Pre-training steps

PRE-TRAINING STEPS Training a multi-billion parameter LLM is usually a highlyexperimental process with lots of trial and error. Normally, theteam would start with a much smaller model size, make sureit’s promising, and scale up to more and more parameters.Keep in mind that as you scale, there will be issues that requireaddressing which simply won’t be […]
Dataset pre-processing

DATASET PRE-PROCESSING In this section, we’ll cover both data adjustments (like deduplication and cleaning) and the pros and cons of various tokenization strategies. Let’s start with the former: Dataset Handling To ensure training data is high-quality and diverse, several pre-processing techniques can be used before the pre-training steps: Data sampling: Certain data components can be […]
Dataset collection

DATASET COLLECTION Bad data leads to bad models. But careful processing of high-quality, high-volume, diverse datasets directly contributes to model performance in downstream tasks as well as model convergence. Dataset diversity is especially important for LLMs. That’s because diversity improves the cross-domain knowledge of the model, as well as its downstream generalization capability. Training on […]