Skip to main content

scale the bench

Created on August 23|Last edited on November 4

base metrics

specific questions

What are the parameters of the actual top performers?

What does plast_clip do?

Does it make things explode? If so, when? Is there a min. value for which it can use memory at all? For simplicity, we fix clip_weights. We also fix plast_proportion and n_hidden.

Computing group metrics from first 10 groups
05001k1.5kStep0.50.60.70.80.9
plast_clip: 100000, learning_rate: 0.0001
plast_clip: 100000, learning_rate: 0.00007
plast_clip: 300000, learning_rate: 0.00005
plast_clip: 70000, learning_rate: 0.0001
plast_clip: 30000, learning_rate: 0.0002
plast_clip: 50000, learning_rate: 0.0001
plast_clip: 10000, learning_rate: 0.0005
plast_clip: 30000, learning_rate: 0.0003
plast_clip: 30000, learning_rate: 0.0001
plast_clip: 10000, learning_rate: 0.0003
3_palindrome_dataset_vary_length
179

what areas gain any accuracy?

3_palindrome_dataset_vary_length
160
2_palindrome_dataset_vary_length
179

What areas blow up?

3_palindrome_dataset_vary_length
313
2_palindrome_dataset_vary_length
252


ok, now, what's the effect of clip_weights

What's the effect of the ratio of high-lr to low-lr parameters?


Run set
70



Run set
70