Regression Report: wandb

[['?we=costa-huang&wpn=trl&ceik=tracker_project_name&cen=log_with&metrics=env/reward_mean&metrics=env/reward_std&metrics=objective/kl_coef&metrics=objective/kl&metrics=objective/entropy&metrics=ppo/std_scores&metrics=ppo/mean_scores&metrics=ppo/learning_rate&metrics=ppo/mean_non_score_reward&metrics=ppo/loss/value&metrics=ppo/loss/total&metrics=ppo/loss/policy&metrics=ppo/policy/advantages_mean&metrics=ppo/policy/approxkl&metrics=ppo/policy/clipfrac&metrics=ppo/policy/entropy&metrics=ppo/returns/mean&metrics=ppo/returns/var', 'wandb?tag=gpt2-sentiment&cl=sentiment analysis (PR-410)']]

Costa

Created on June 8|Last edited on June 8

Comment

﻿
﻿
env/reward_mean trl
env/reward_mean trl
Computing group metrics from first 50 groups
100200300Steps0.511.52Episodic Return
env/reward_std trl
env/reward_std trl
Computing group metrics from first 50 groups
100200300Steps0.811.21.4Episodic Return
objective/kl_coef trl
objective/kl_coef trl
Computing group metrics from first 50 groups
100200300Steps0.20.220.240.260.28Episodic Return
objective/kl trl
objective/kl trl
Computing group metrics from first 50 groups
100200300Steps2468Episodic Return
objective/entropy trl
objective/entropy trl
Computing group metrics from first 50 groups
100200300Steps95100105110115Episodic Return
ppo/std_scores trl
ppo/std_scores trl
Computing group metrics from first 50 groups
100200300Steps11.21.41.6Episodic Return
sentiment analysis (PR-410)236
﻿
﻿

Add a comment