Implementation of "Attention is all you need" paper
Created on March 24|Last edited on March 24
Comment
Interested in implementing the “Attention is all you need paper”? Brando K has open-sourced - from-scratch implementation of the seminal research that introduced the original transformer. Some features of this implementation:
✔️ Highly customizable configuration and training loop
✔️ Runnable on CPU and GPU
✔️ W&B integration for detailed logging of every metric
✔️ Pretrained models and their training details
✔️ Gradient Accumulation
✔️ Label smoothing
✔️ BPE and WordLevel Tokenizers
✔️ Dynamic Batching
✔️ Batch Dataset Processing
✔️ Bleu-score calculation after every epoch
✔️ Documented dimensions for every step of the architecture
✔️ Shown progress of translation for example after every epoch
He’s also integrated Weights & Biases to automatically log runs and visualizations, as well as made his W&B project public so everyone can see how the model is trained.
👉 Check out the excellent implementation: https://github.com/bkoch4142/attention-is-all-you-need-paper
👉 Weights & Biase project page: https://wandb.ai/bkoch4142/attention-is-all-you-need-paper/runs/1rbhz2as?workspace=user-bkoch4142
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.