Skip to main content

Regression Report: wandb

[['?we=costa-huang&wpn=trl&xaxis=_step&ceik=trl_ppo_trainer_config.value.tracker_project_name&cen=trl_ppo_trainer_config.value.log_with&metrics=env/reward_mean&metrics=objective/kl', 'wandb?tag=gpt2-sentiment&tag=rlops-pilot&cl=sentiment analysis (PR-410)', 'wandb?tag=gpt2-sentiment&tag=pr-457&tag=sgd&cl=sentiment analysis SGD', 'wandb?tag=gpt2-sentiment&tag=pr-457&tag=adam&tag=mideps&cl=sentiment analysis Adam w/ eps=4e-3']]
Created on July 10|Last edited on July 10

020406080Steps0.511.52Episodic Return
020406080Steps-60-40-200Episodic Return
51015Time (minutes)0.511.52Episodic Return
51015Time (minutes)-80-60-40-200Episodic Return
sentiment analysis (PR-410)
10
sentiment analysis SGD
10
sentiment analysis Adam w/ eps=4e-3
20