Skip to main content

Experiment Data

This is the data we used to generate the plots. Unfortunately, we cannot make the wandb dashboard public because we don't have the pro version
Created on May 9|Last edited on May 13

Memory Profiling

  • This section shows how our continuous batching implementation improves GPU memory utilization
  • Here, the key graphs are the first 2: Stream Utilization and GPU memory usage.
  • The top left (stream utilization) represents the % of prompt generations in the stream that are actively generating tokens and not finished
    • We see that in static batching, the stream starts full, then as various generations in the stream complete, the stream utilization decreases, until all generations in the stream are complete and we refill
  • In the second graph (gpu memory usage %), we see that static batching has these large memory spikes and then drops down, whereas continuous batching has a consistent stream usage.

Select runs that logged gpu memory usage (%)
to visualize data in this line chart.
Select runs that logged tokens_per_second
to visualize data in this line chart.
Select runs that logged stream utilization (%)
to visualize data in this line chart.
Run set



Acceptance Rates across Generations

  • Note: each level represents the average Ngram acceptance rate for each generation of a given prompt (so sequentially training, we have step 0 which is the first generation, and step 7 which has all the previous generations in training data.
  • We see a steady increase in acceptance rates as we move to another generation level, which shows our sequentially trained ngram models are effective

Run set