Skip to main content

Tinystories-1M

Created on March 22|Last edited on April 28

Local vs e2e. Blocks.0.hook_resid_post

Note, wandb doesn't seem to allow colouring points based on a "tag". In the following plots, the better curves (upper left) are from e2e runs

10203040506070sparsity/eval/L_0/blocks.0.hook_resid_post-0.4-0.3-0.2-0.1performance/eval/difference_ce_loss
Run set
16


Local vs e2e vs e2e+Downstream - Blocks.3.hook_resid_post


Run set
22


Local

Local Blocks.0.hook_resid_post



Run set
7


Local Blocks.3.hook_resid_post


Run set
7


Local Blocks.6.hook_resid_post



Run set
7


E2E (sparsity + kl divergence)

e2e Blocks.0.hook_resid_post



Run set
9


e2e Blocks.3.hook_resid_post


Run set
10



e2e Blocks.6.hook_resid_post


Run set
9


e2e+Downstream Blocks.3.hook_resid_post


Run set
3