Stanford Student Builds Mini ChatGPT
Testing old LLM's with new training methods
Created on May 3|Last edited on May 3
Comment
Large language models such as GPT-3, have revolutionized natural language processing in recent years, outperforming specialized systems in various tasks including summarization, question answering, and sentiment analysis. The new-found abilities of these models, as demonstrated by ChatGPT, have been in large part boosted by aligning them with human feedback (RLHF). However, the massive computational requirements of these models make it challenging for individual researchers or smaller research groups to participate in this area of study, and LLM’s have been controlled primarily by large tech companies. A Stanford Student Yanjia Li decided to try to answer this question as part of the CS224N class, and has published code and results for the experiment.
The Question
The study seeks to answer the question: Can smaller language models be aligned with human feedback to produce conversations similar to larger models like GPT-3 175B? By reproducing the InstructGPT training pipeline with Reinforcement Learning from Human Feedback (RLHF) tuning, the authors explored the impact of RLHF on smaller models, such as GPT-2 Medium.
Implementation
The authors implemented the GPT architecture from scratch (based around Andrej Karpathy’s implementation of nano-gpt, with pretrained weights from HuggingFace) and used a low-rank approximation to reduce computational complexity for supervised fine tuning. They then created an RLHF training pipeline consisting of three trainers for supervised fine-tuning, reward model training, and Proximal Policy Optimization (PPO) training. The model was initially fine-tuned using half of the Anthropic HH-RLHF dataset, while the other half was used for reward model training.

Example results from the paper
Results
The experiments were conducted on two datasets: Anthropic HH-RLHF and the Awesome ChatGPT Prompts dataset. The latter was chosen to evaluate the model's generalization ability, as it contains diverse prompts that differ significantly from the training data. The results showed that ChatGPT preferred both the supervised fine-tuned (SFT) and RLHF-tuned (PPO) models over the vanilla GPT-2 Medium model. This suggests that the vanilla GPT-2 tends to generate less engaging and repetitive responses, whereas the SFT and PPO models produce more dialogue-like outputs. Furthermore, ChatGPT exhibited a clear preference for the PPO model over the SFT model, demonstrating the effectiveness of RLHF tuning. Qualitative analysis revealed that the vanilla GPT-2 model often generated short and incoherent answers, while the SFT and PPO models produced more useful and contextually appropriate responses. This indicates that even smaller models like GPT-2 Medium can hold substantial knowledge and can be improved with the help of SFT and RLHF techniques.


Conclusion
Although LLM’s like Meta's Llama can run on high-end consumer graphics cards, such hardware remains out of reach for many researchers due to financial constraints or limited availability. This underscores the need for developing even smaller language models that can be effectively experimented with, allowing researchers with limited resources to contribute to the advancements in NLP. GPT-2 could be an excellent candidate for further optimization and tuning to create a more efficient and powerful model capable of running on local devices. By refining and optimizing smaller models like GPT-2, the research community can ensure broader participation and accelerate innovation in this critical area of artificial intelligence.
The Report: https://github.com/ethanyanjiali/minChatGPT/blob/main/report.pdf
The Code: https://github.com/ethanyanjiali/minChatGPT
Add a comment