Skip to main content

Regression Report: wandb

[['?we=costa-huang&wpn=trl&ceik=tracker_project_name&cen=log_with&metrics=env/reward_mean&metrics=env/reward_std&metrics=objective/kl_coef&metrics=objective/kl&metrics=objective/entropy&metrics=ppo/std_scores&metrics=ppo/mean_scores&metrics=ppo/learning_rate&metrics=ppo/mean_non_score_reward&metrics=ppo/loss/value&metrics=ppo/loss/total&metrics=ppo/loss/policy&metrics=ppo/policy/advantages_mean&metrics=ppo/policy/approxkl&metrics=ppo/policy/clipfrac&metrics=ppo/policy/entropy&metrics=ppo/returns/mean&metrics=ppo/returns/var', 'wandb?tag=gpt2-sentiment&tag=rlops-pilot&cl=sentiment analysis (PR-410)', 'wandb?tag=gpt2-sentiment&tag=pr-423&cl=sentiment analysis (PR-423)']]
Created on June 9|Last edited on June 9

020406080Steps0.511.52Episodic Return
020406080Steps1.11.21.31.41.5Episodic Return
020406080Steps0.150.160.170.180.190.2Episodic Return
020406080Steps-60-40-200Episodic Return
020406080Steps406080100Episodic Return
020406080Steps1.11.21.31.41.5Episodic Return
sentiment analysis (PR-410)
10
sentiment analysis (PR-423)
9