Skip to main content

(with KL) GPT2 Learning functionally important features with end-to-end dictionary learning

Report containing pareto frontiers for local SAEs, e2e SAEs, and e2e+ds SAEs
Created on August 22|Last edited on August 22
See https://wandb.ai/sparsify/gpt2 for all runs used in the paper, including appendices (with the exception of tinystories-1m runs which can be found at https://wandb.ai/sparsify/tinystories-1m-2). The runs in the pareto frontier for each method can be found in the plots below or by using the wandb run tags "pareto", "local", "e2e", and "e2eds".

Blocks.2.hook_resid_pre


0510152025303540455055606570Index
20406080100120140sparsity/eval/L_0/blocks.2.hook_resid_pre-0.6-0.4-0.20performance/eval/difference_ce_loss
Run set
21


Blocks.6.hook_resid_pre


Run set
23


Blocks.10.hook_resid_pre


Run set
23