Skip to main content

Implementation of "Attention is all you need" paper

Created on March 24|Last edited on March 24
Interested in implementing the “Attention is all you need paper”? Brando K has open-sourced - from-scratch implementation of the seminal research that introduced the original transformer. Some features of this implementation:
✔️ Highly customizable configuration and training loop
✔️ Runnable on CPU and GPU
✔️ W&B integration for detailed logging of every metric
✔️ Pretrained models and their training details
✔️ Gradient Accumulation
✔️ Label smoothing
✔️ BPE and WordLevel Tokenizers
✔️ Dynamic Batching
✔️ Batch Dataset Processing
✔️ Bleu-score calculation after every epoch
✔️ Documented dimensions for every step of the architecture
✔️ Shown progress of translation for example after every epoch

He’s also integrated Weights & Biases to automatically log runs and visualizations, as well as made his W&B project public so everyone can see how the model is trained.

👉 Check out the excellent implementation: https://github.com/bkoch4142/attention-is-all-you-need-paper
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.