Mujoco Demo

Created on July 21|Last edited on July 21
Comment
﻿
Findings on HalfCheetah-v2In this benchmark, we have studied the performance of DDPG, TD3, and PPO. Overall, we find TD3 to Achieve the highest returns, which is corroborated by the TD3 paper.
The agent trained with PPO achieves around 1700 return by walking with the Halfchetah's head, which explains the poor performance compared to TD3 or DDPG. 
💡
﻿
Episodic Return
Episodic Return
500k1M1.5Mglobal_step0200040006000800010000
video.0
This run didn't log media for key "video.0", step 10759, index 0. Docs →
This run didn't log media for key "video.0", step 10759, index 0. Docs →
This run didn't log media for key "video.0", step 10759, index 0. Docs →
This run didn't log media for key "video.0", step 10759, index 0. Docs →
This run didn't log media for key "video.0", step 1485, index 0. Docs →
This run didn't log media for key "video.0", step 1485, index 0. Docs →
Step
Run set6
﻿
﻿
Add a comment