Skip to main content
gmongaras1
Projects
Cottention_Tests
Log in
Sign up
Overview
Workspace
Runs
Automat.
Sweeps
Reports
Artifacts
Gmongaras's workspace
Personal workspace
Manual workspace
Changes are only visible to you.
Runs
177
Name
58 visualized
10term_ELU_1GPU_768seqlen_16bz
10term_ELU_1GPU_768seqlen_16bz
4term_ELU_1GPU_768seqlen_16bz
4term_ELU_1GPU_768seqlen_16bz
3term_ELU_1GPU_768seqlen_16bz
3term_ELU_1GPU_768seqlen_16bz
2term_ELU_1GPU_768seqlen_16bz
2term_ELU_1GPU_768seqlen_16bz
ELU_1GPU_768seqlen_16bz
ELU_1GPU_768seqlen_16bz
10term_ReLU_1GPU_768seqlen_16bz
10term_ReLU_1GPU_768seqlen_16bz
4term_ReLU_1GPU_768seqlen_16bz
4term_ReLU_1GPU_768seqlen_16bz
3term_ReLU_1GPU_768seqlen_16bz
3term_ReLU_1GPU_768seqlen_16bz
2term_ReLU_1GPU_768seqlen_16bz
2term_ReLU_1GPU_768seqlen_16bz
ReLU_1GPU_768seqlen_16bz
ReLU_1GPU_768seqlen_16bz
2term_ReLU_1GPU_768seqlen_16bz
2term_ReLU_1GPU_768seqlen_16bz
ReLU_1GPU_768seqlen_16bz
ReLU_1GPU_768seqlen_16bz
ReLU_1GPU_768seqlen_16bz
ReLU_1GPU_768seqlen_16bz
10term_cosine_1GPU_768seqlen_16bz
10term_cosine_1GPU_768seqlen_16bz
3term_cosine_1GPU_768seqlen_16bz
3term_cosine_1GPU_768seqlen_16bz
2term_cosine_1GPU_768seqlen_16bz
2term_cosine_1GPU_768seqlen_16bz
cosine_1GPU_768seqlen_16bz
cosine_1GPU_768seqlen_16bz
cosine_1GPU_768seqlen_16bz
cosine_1GPU_768seqlen_16bz
cosine_1GPU_768seqlen_16bz
cosine_1GPU_768seqlen_16bz
8termtaylor_softmax_2GPU_768seqlen_16bz
8termtaylor_softmax_2GPU_768seqlen_16bz
1-20
of 177
Settings
Add panels
Charts
4
1-4 of 4
perplexity
perplexity
Showing first 10 runs
0
50k
100k
150k
200k
Step
100000
200000
300000
softmax_10termtaylorsoftmax_2GPU_768seqlen_16bz
softmax_80termtaylorsoftmax_2GPU_768seqlen_16bz
softmax_softmax_2GPU_768seqlen_16bz
softmax_2termtaylorsoftmax_2GPU_768seqlen_16bz
softmax_4termtaylorsoftmax_2GPU_768seqlen_16bz
softmax_80termtaylorsoftmax_2GPU_768seqlen_16bz
softmax_detachsumdim1_gate_outnorm_2GPU_768seqlen_16bz
softmax_detachsumdim2_gate_outnorm_2GPU_768seqlen_16bz
double_expgate_tanh_Sdenom_outnorm_2GPU_768seqlen_16bz
double_lineargate_Sdenom_outnorm_2GPU_768seqlen_16bz
lr
lr
Showing first 10 runs
0
50k
100k
150k
200k
Step
0
0.00002
0.00004
0.00006
0.00008
0.0001
softmax_10termtaylorsoftmax_2GPU_768seqlen_16bz
softmax_80termtaylorsoftmax_2GPU_768seqlen_16bz
softmax_softmax_2GPU_768seqlen_16bz
softmax_2termtaylorsoftmax_2GPU_768seqlen_16bz
softmax_4termtaylorsoftmax_2GPU_768seqlen_16bz
softmax_80termtaylorsoftmax_2GPU_768seqlen_16bz
softmax_detachsumdim1_gate_outnorm_2GPU_768seqlen_16bz
softmax_detachsumdim2_gate_outnorm_2GPU_768seqlen_16bz
double_expgate_tanh_Sdenom_outnorm_2GPU_768seqlen_16bz
double_lineargate_Sdenom_outnorm_2GPU_768seqlen_16bz
loss
loss
Showing first 50 runs
0
50k
100k
150k
200k
Step
2
4
6
8
10
12
softmax_10termtaylorsoftmax_2GPU_768seqlen_16bz
softmax_80termtaylorsoftmax_2GPU_768seqlen_16bz
softmax_softmax_2GPU_768seqlen_16bz
softmax_2termtaylorsoftmax_2GPU_768seqlen_16bz
softmax_4termtaylorsoftmax_2GPU_768seqlen_16bz
softmax_80termtaylorsoftmax_2GPU_768seqlen_16bz
softmax_detachsumdim1_gate_outnorm_2GPU_768seqlen_16bz
softmax_detachsumdim2_gate_outnorm_2GPU_768seqlen_16bz
double_expgate_tanh_Sdenom_outnorm_2GPU_768seqlen_16bz
double_lineargate_Sdenom_outnorm_2GPU_768seqlen_16bz
double_cubegate_Sdenom_outnorm_2GPU_768seqlen_16bz
double_squaregate_Sdenom_outnorm_2GPU_768seqlen_16bz
double_expgate_Sdenom_outnorm_2GPU_768seqlen_16bz
expgate_highdenom_learnconst_2GPU_768seqlen_16bz
expgate_Sdenom_outnorm_2GPU_768seqlen_16bz
softmax_2GPU_768seqlen_16bz
expgate_highdenom_outnorm_2GPU_768seqlen_16bz
expgate_learnhighdenom_1GPU_256seqlen_32bz
expgate_highdenom_1GPU_256seqlen_32bz
expgate_outnorm_1GPU_256seqlen_32bz
expgate_1GPU_256seqlen_32bz
expgate_learndenom_1GPU_256seqlen_32bz
softmax_1GPU_256seqlen_32bz
expgate_learndenom_1GPU_256seqlen_32bz
lineargate_1GPU_256seqlen_32bz
lineargate_1GPU_256seqlen_32bz
squaredgate_1GPU_256seqlen_32bz
relugate2_1GPU_256seqlen_32bz
relugate_1GPU_256seqlen_32bz
softmax_decayv1_1GPU_256seqlen_32bz
softmax_decayv1_1GPU_256seqlen_32bz
expgate_1GPU_256seqlen_32bz
softmax_decayv1_1GPU_256seqlen_32bz
memmosaic_1GPU_256seqlen_32bz
softmax_learnablebase_1GPU_256seqlen_32bz
softmax_Covar_1GPU_256seqlen_32bz
softmax_L2Dist_1GPU_256seqlen_32bz
softmax_vardiv_1GPU_256seqlen_32bz
relus80termtaylorseries_1GPU_256seqlen_32bz
relus4termtaylorseries_1GPU_256seqlen_32bz
relusquared_1GPU_256seqlen_32bz
relulinear_1GPU_256seqlen_32bz
cosine80termtaylorseries_1GPU_256seqlen_32bz
cosine4termtaylorseries_1GPU_256seqlen_32bz
cosinesquared_1GPU_256seqlen_32bz
cosinelinear_1GPU_256seqlen_32bz
coshmax_1GPU_256seqlen_32bz
sinhmax_1GPU_256seqlen_32bz
softmax_decomposedodd_1GPU_256seqlen_32bz
softmax_decomposedeven_1GPU_256seqlen_32bz
log(loss)
log(loss)
Showing first 50 runs
0
50k
100k
150k
200k
Step
0.5
1
1.5
2
2.5
softmax_10termtaylorsoftmax_2GPU_768seqlen_16bz
softmax_80termtaylorsoftmax_2GPU_768seqlen_16bz
softmax_softmax_2GPU_768seqlen_16bz
softmax_2termtaylorsoftmax_2GPU_768seqlen_16bz
softmax_4termtaylorsoftmax_2GPU_768seqlen_16bz
softmax_80termtaylorsoftmax_2GPU_768seqlen_16bz
softmax_detachsumdim1_gate_outnorm_2GPU_768seqlen_16bz
softmax_detachsumdim2_gate_outnorm_2GPU_768seqlen_16bz
double_expgate_tanh_Sdenom_outnorm_2GPU_768seqlen_16bz
double_lineargate_Sdenom_outnorm_2GPU_768seqlen_16bz
double_cubegate_Sdenom_outnorm_2GPU_768seqlen_16bz
double_squaregate_Sdenom_outnorm_2GPU_768seqlen_16bz
double_expgate_Sdenom_outnorm_2GPU_768seqlen_16bz
expgate_highdenom_learnconst_2GPU_768seqlen_16bz
expgate_Sdenom_outnorm_2GPU_768seqlen_16bz
softmax_2GPU_768seqlen_16bz
expgate_highdenom_outnorm_2GPU_768seqlen_16bz
expgate_learnhighdenom_1GPU_256seqlen_32bz
expgate_highdenom_1GPU_256seqlen_32bz
expgate_outnorm_1GPU_256seqlen_32bz
expgate_1GPU_256seqlen_32bz
expgate_learndenom_1GPU_256seqlen_32bz
softmax_1GPU_256seqlen_32bz
expgate_learndenom_1GPU_256seqlen_32bz
lineargate_1GPU_256seqlen_32bz
lineargate_1GPU_256seqlen_32bz
squaredgate_1GPU_256seqlen_32bz
relugate2_1GPU_256seqlen_32bz
relugate_1GPU_256seqlen_32bz
softmax_decayv1_1GPU_256seqlen_32bz
softmax_decayv1_1GPU_256seqlen_32bz
expgate_1GPU_256seqlen_32bz
softmax_decayv1_1GPU_256seqlen_32bz
memmosaic_1GPU_256seqlen_32bz
softmax_learnablebase_1GPU_256seqlen_32bz
softmax_Covar_1GPU_256seqlen_32bz
softmax_L2Dist_1GPU_256seqlen_32bz
softmax_vardiv_1GPU_256seqlen_32bz
relus80termtaylorseries_1GPU_256seqlen_32bz
relus4termtaylorseries_1GPU_256seqlen_32bz
relusquared_1GPU_256seqlen_32bz
relulinear_1GPU_256seqlen_32bz
cosine80termtaylorseries_1GPU_256seqlen_32bz
cosine4termtaylorseries_1GPU_256seqlen_32bz
cosinesquared_1GPU_256seqlen_32bz
cosinelinear_1GPU_256seqlen_32bz
coshmax_1GPU_256seqlen_32bz
sinhmax_1GPU_256seqlen_32bz
softmax_decomposedodd_1GPU_256seqlen_32bz
softmax_decomposedeven_1GPU_256seqlen_32bz
System
15
1-6 of 15
Add section