Skip to main content

Clashluke's group workspace

Timestamps visible
2023-05-17 16:48:18
 53 | /body_ctx:0/block:0/reversible:1/write:0/scale_norm_act_linear:5/weight0                                               | (8, 1024, 1024) |  33,554,432
2023-05-17 16:48:18
 54 | /body_ctx:0/block:0/reversible:1/write:0/scale_norm_act_linear:5/weight2                                               | (8, 1024, 1024) |  33,554,432
2023-05-17 16:48:18
 55 | /optimizer:0//body_ctx:0/block:0/loss_fn:0/out_embd/ema:0/momentum_buffer                                              |  (8, 4096, 256) |  33,554,432
2023-05-17 16:48:18
 56 | /optimizer:0//body_ctx:0/block:0/loss_fn:0/out_embd/ema:1/momentum_buffer                                              |  (8, 4096, 256) |  33,554,432
2023-05-17 16:48:18
 57 | /optimizer:0//body_ctx:0/block:0/reversible:0/read:0/input_fn:0/scale_norm_act_linear:1/weight1/ema:0/momentum_buffer  | (8, 1024, 1024) |  33,554,432
2023-05-17 16:48:18
 58 | /optimizer:0//body_ctx:0/block:0/reversible:0/read:0/input_fn:0/scale_norm_act_linear:1/weight1/ema:1/momentum_buffer  | (8, 1024, 1024) |  33,554,432
2023-05-17 16:48:18
 59 | /optimizer:0//body_ctx:0/block:0/reversible:0/read:0/input_fn:0/scale_norm_act_linear:1/weight2/ema:0/momentum_buffer  | (8, 1024, 1024) |  33,554,432
2023-05-17 16:48:18
 60 | /optimizer:0//body_ctx:0/block:0/reversible:0/read:0/input_fn:0/scale_norm_act_linear:1/weight2/ema:1/momentum_buffer  | (8, 1024, 1024) |  33,554,432
2023-05-17 16:48:18
 61 | /optimizer:0//body_ctx:0/block:0/reversible:1/write:0/input_fn:1/scale_norm_act_linear:4/weight0/ema:0/momentum_buffer | (8, 1024, 1024) |  33,554,432
2023-05-17 16:48:18
 62 | /optimizer:0//body_ctx:0/block:0/reversible:1/write:0/input_fn:1/scale_norm_act_linear:4/weight0/ema:1/momentum_buffer | (8, 1024, 1024) |  33,554,432
2023-05-17 16:48:18
 63 | /optimizer:0//body_ctx:0/block:0/reversible:1/write:0/input_fn:1/scale_norm_act_linear:4/weight1/ema:0/momentum_buffer | (8, 1024, 1024) |  33,554,432
2023-05-17 16:48:18
 64 | /optimizer:0//body_ctx:0/block:0/reversible:1/write:0/input_fn:1/scale_norm_act_linear:4/weight1/ema:1/momentum_buffer | (8, 1024, 1024) |  33,554,432
2023-05-17 16:48:18
 65 | /optimizer:0//body_ctx:0/block:0/reversible:1/write:0/scale_norm_act_linear:5/weight0/ema:0/momentum_buffer            | (8, 1024, 1024) |  33,554,432
2023-05-17 16:48:18
 66 | /optimizer:0//body_ctx:0/block:0/reversible:1/write:0/scale_norm_act_linear:5/weight0/ema:1/momentum_buffer            | (8, 1024, 1024) |  33,554,432
2023-05-17 16:48:18
 67 | /optimizer:0//body_ctx:0/block:0/reversible:1/write:0/scale_norm_act_linear:5/weight2/ema:0/momentum_buffer            | (8, 1024, 1024) |  33,554,432
2023-05-17 16:48:18
 68 | /optimizer:0//body_ctx:0/block:0/reversible:1/write:0/scale_norm_act_linear:5/weight2/ema:1/momentum_buffer            | (8, 1024, 1024) |  33,554,432
2023-05-17 16:48:18
 69 | /body_ctx:0/block:0/reversible:0/read:0/_output:0/scale_norm_act_linear:2/weight0                                      | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 70 | /body_ctx:0/block:0/reversible:0/read:0/_output:0/scale_norm_act_linear:2/weight1                                      | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 71 | /body_ctx:0/block:0/reversible:0/read:0/input_fn:0/input_embed:1/inp_embd                                              | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 72 | /body_ctx:0/block:0/reversible:0/read:0/input_fn:0/scale_norm_act_linear:0/weight0                                     | (8, 1024, 4096) | 134,217,728
2023-05-17 16:48:18
 73 | /body_ctx:0/block:0/reversible:0/read:0/input_fn:0/scale_norm_act_linear:1/weight0                                     | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 74 | /body_ctx:0/block:0/reversible:0/read:0/linear:0/conv_weight                                                           | (8, 1024, 4096) | 134,217,728
2023-05-17 16:48:18
 75 | /body_ctx:0/block:0/reversible:1/write:0/_output:1/scale_norm_act_linear:6/weight0                                     | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 76 | /body_ctx:0/block:0/reversible:1/write:0/_output:1/scale_norm_act_linear:6/weight1                                     | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 77 | /body_ctx:0/block:0/reversible:1/write:0/input_fn:1/input_embed:3/inp_embd                                             | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 78 | /body_ctx:0/block:0/reversible:1/write:0/input_fn:1/scale_norm_act_linear:3/weight0                                    | (8, 1024, 4096) | 134,217,728
2023-05-17 16:48:18
 79 | /body_ctx:0/block:0/reversible:1/write:0/input_fn:1/scale_norm_act_linear:4/weight2                                    | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 80 | /body_ctx:0/block:0/reversible:1/write:0/scale_norm_act_linear:5/weight1                                               | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 81 | /optimizer:0//body_ctx:0/block:0/reversible:0/read:0/_output:0/scale_norm_act_linear:2/weight0/ema:0/momentum_buffer   | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 82 | /optimizer:0//body_ctx:0/block:0/reversible:0/read:0/_output:0/scale_norm_act_linear:2/weight0/ema:1/momentum_buffer   | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 83 | /optimizer:0//body_ctx:0/block:0/reversible:0/read:0/_output:0/scale_norm_act_linear:2/weight1/ema:0/momentum_buffer   | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 84 | /optimizer:0//body_ctx:0/block:0/reversible:0/read:0/_output:0/scale_norm_act_linear:2/weight1/ema:1/momentum_buffer   | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 85 | /optimizer:0//body_ctx:0/block:0/reversible:0/read:0/input_fn:0/input_embed:1/inp_embd/ema:0/momentum_buffer           | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 86 | /optimizer:0//body_ctx:0/block:0/reversible:0/read:0/input_fn:0/input_embed:1/inp_embd/ema:1/momentum_buffer           | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 87 | /optimizer:0//body_ctx:0/block:0/reversible:0/read:0/input_fn:0/scale_norm_act_linear:0/weight0/ema:0/momentum_buffer  | (8, 1024, 4096) | 134,217,728
2023-05-17 16:48:18
 88 | /optimizer:0//body_ctx:0/block:0/reversible:0/read:0/input_fn:0/scale_norm_act_linear:0/weight0/ema:1/momentum_buffer  | (8, 1024, 4096) | 134,217,728
2023-05-17 16:48:18
 89 | /optimizer:0//body_ctx:0/block:0/reversible:0/read:0/input_fn:0/scale_norm_act_linear:1/weight0/ema:0/momentum_buffer  | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 90 | /optimizer:0//body_ctx:0/block:0/reversible:0/read:0/input_fn:0/scale_norm_act_linear:1/weight0/ema:1/momentum_buffer  | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 91 | /optimizer:0//body_ctx:0/block:0/reversible:0/read:0/linear:0/conv_weight/ema:0/momentum_buffer                        | (8, 1024, 4096) | 134,217,728
2023-05-17 16:48:18
 92 | /optimizer:0//body_ctx:0/block:0/reversible:0/read:0/linear:0/conv_weight/ema:1/momentum_buffer                        | (8, 1024, 4096) | 134,217,728
2023-05-17 16:48:18
 93 | /optimizer:0//body_ctx:0/block:0/reversible:1/write:0/_output:1/scale_norm_act_linear:6/weight0/ema:0/momentum_buffer  | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 94 | /optimizer:0//body_ctx:0/block:0/reversible:1/write:0/_output:1/scale_norm_act_linear:6/weight0/ema:1/momentum_buffer  | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 95 | /optimizer:0//body_ctx:0/block:0/reversible:1/write:0/_output:1/scale_norm_act_linear:6/weight1/ema:0/momentum_buffer  | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 96 | /optimizer:0//body_ctx:0/block:0/reversible:1/write:0/_output:1/scale_norm_act_linear:6/weight1/ema:1/momentum_buffer  | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 97 | /optimizer:0//body_ctx:0/block:0/reversible:1/write:0/input_fn:1/input_embed:3/inp_embd/ema:0/momentum_buffer          | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 98 | /optimizer:0//body_ctx:0/block:0/reversible:1/write:0/input_fn:1/input_embed:3/inp_embd/ema:1/momentum_buffer          | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
 99 | /optimizer:0//body_ctx:0/block:0/reversible:1/write:0/input_fn:1/scale_norm_act_linear:3/weight0/ema:0/momentum_buffer | (8, 1024, 4096) | 134,217,728
2023-05-17 16:48:18
100 | /optimizer:0//body_ctx:0/block:0/reversible:1/write:0/input_fn:1/scale_norm_act_linear:3/weight0/ema:1/momentum_buffer | (8, 1024, 4096) | 134,217,728
2023-05-17 16:48:18
101 | /optimizer:0//body_ctx:0/block:0/reversible:1/write:0/input_fn:1/scale_norm_act_linear:4/weight2/ema:0/momentum_buffer | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
102 | /optimizer:0//body_ctx:0/block:0/reversible:1/write:0/input_fn:1/scale_norm_act_linear:4/weight2/ema:1/momentum_buffer | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
103 | /optimizer:0//body_ctx:0/block:0/reversible:1/write:0/scale_norm_act_linear:5/weight1/ema:0/momentum_buffer            | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
104 | /optimizer:0//body_ctx:0/block:0/reversible:1/write:0/scale_norm_act_linear:5/weight1/ema:1/momentum_buffer            | (8, 4096, 1024) | 134,217,728
2023-05-17 16:48:18
Parameters: 470,000,640
2023-05-17 16:48:18
Buffers:    940,001,280
2023-05-17 16:53:04
[    2/65536] Loss:  5.675 - Accuracy:    0.002 | LearningRate: 0.00000 | StepTime: 287.886728s - Rate:  10,926.9 Tokens/s
2023-05-17 16:56:59
[    3/65536] Loss:  5.672 - Accuracy:    0.002 | LearningRate: 0.00000 | StepTime: 233.569828s - Rate:   8,039.9 Tokens/s
2023-05-17 17:00:53
[    4/65536] Loss:  5.659 - Accuracy:    0.002 | LearningRate: 0.00000 | StepTime: 233.579007s - Rate:   6,941.3 Tokens/s