Skip to main content

Tokenizer Comparison

Fineweb-EDU 1.4b Tokenizers: * llama2 (~32k) * llama3 (~128k) * neox (~50k)
Created on November 19|Last edited on May 12

Standard Panels



5G6G7G8G9G10G20G30G40Gthroughput/total_tokens11.051.11.15bpb
02k4k6k8kStep0.30.350.40.450.50.550.6
Run set
3



Run set
3