Skip to main content

Regression Report: wandb

[['?we=costa-huang&wpn=trl&ceik=tracker_project_name&cen=log_with&metrics=env/reward_mean&metrics=env/reward_std&metrics=objective/kl_coef&metrics=objective/kl&metrics=objective/entropy&metrics=ppo/std_scores&metrics=ppo/mean_scores&metrics=ppo/learning_rate&metrics=ppo/mean_non_score_reward&metrics=ppo/loss/value&metrics=ppo/loss/total&metrics=ppo/loss/policy&metrics=ppo/policy/advantages_mean&metrics=ppo/policy/approxkl&metrics=ppo/policy/clipfrac&metrics=ppo/policy/entropy&metrics=ppo/returns/mean&metrics=ppo/returns/var', 'wandb?tag=gpt2-sentiment&cl=sentiment analysis (PR-410)']]
Created on June 8|Last edited on June 8

Computing group metrics from first 50 groups
100200300Steps0.511.52Episodic Return
Computing group metrics from first 50 groups
100200300Steps0.811.21.4Episodic Return
Computing group metrics from first 50 groups
100200300Steps0.20.220.240.260.28Episodic Return
Computing group metrics from first 50 groups
100200300Steps2468Episodic Return
Computing group metrics from first 50 groups
100200300Steps95100105110115Episodic Return
Computing group metrics from first 50 groups
100200300Steps11.21.41.6Episodic Return
sentiment analysis (PR-410)
236