Skip to main content

Igoro's group workspace

Timestamps visible
2021-11-02 22:56:00
    37: ParallelTransformerLayerPipe
2021-11-02 22:56:00
    38: ParallelTransformerLayerPipe
2021-11-02 22:56:00
    39: ParallelTransformerLayerPipe
2021-11-02 22:56:00
    40: ParallelTransformerLayerPipe
2021-11-02 22:56:00
    41: ParallelTransformerLayerPipe
2021-11-02 22:56:00
    42: ParallelTransformerLayerPipe
2021-11-02 22:56:00
    43: ParallelTransformerLayerPipe
2021-11-02 22:56:00
    44: ParallelTransformerLayerPipe
2021-11-02 22:56:00
    45: ParallelTransformerLayerPipe
2021-11-02 22:56:00
    46: _post_transformer_block
2021-11-02 22:56:00
    47: NormPipe
2021-11-02 22:56:00
    48: ParallelLinearPipe
2021-11-02 22:56:00
  loss: partial
2021-11-02 22:56:00
Configuring Optimizer type: Adam with params: {'betas': [0.9, 0.95], 'eps': 1e-08, 'lr': 9.7e-05}
2021-11-02 22:56:00
> learning rate decay style: cosine
2021-11-02 22:56:00
DeepSpeed is enabled.
2021-11-02 22:56:00
[2021-11-02 22:56:00,649] [INFO] [logging.py:60:log_dist] [Rank 0] DeepSpeed info: version=0.3.15+eb7f5cf, git-hash=eb7f5cf, git-branch=main
2021-11-02 22:56:00
[2021-11-02 22:56:00,649] [WARNING] [config.py:77:_sanity_check] DeepSpeedConfig: cpu_offload is deprecated. Please use offload_optimizer.