Skip to main content
Reports
Created by
Created On
Last edited
AirSim-SAC-steps
Learning to perform a custom OpenAI Gym task of landing a drone autonomously in Microsoft AirSim (goal score): AirSim-v0 (1000). Soft Actor-Critic (SAC) is trained on the cost function for this task. SAC achieves the goal scores for the tasks, verified by looking at the "mean episode reward (RL)" chart. The learned SAC models are used to generate expert demonstrations of the task. GAIL is then trained on these demos to learn to imitate the tasks.
1
2020-10-03
SAC-without-AirSim-steps
Learning to perform some OpenAI Gym, MuJoCo control tasks (goal scores): Pendulum-v0 (-200), CartPole-v1 (500), LunarLanderContinuous-v2 (200), Hopper-v2 (3500), and HalfCheetah-v2 (4800). Soft Actor-Critic (SAC) is trained on the cost function of these tasks. SAC achieves the goal scores for the tasks, verified by looking at the "mean episode reward (RL)" chart. The learned SAC models are used to generate expert demonstrations of the task. GAIL is then trained on these demos to learn to imitate the tasks.
1
2020-09-29
GAIL-Gym-steps
Learning to imitate some OpenAI Gym, MuJoCo control tasks, from demos of the task: Pendulum-v0 (-200), CartPole-v1 (500), LunarLanderContinuous-v2 (200), Hopper-v2 (3500), and HalfCheetah-v2 (4800). Imitation is defined as matching learned model's score (mean, std) with expert's score (mean, std). We look at the imitation accuracy of Generative Adversarial Imitation Learning (GAIL), which learns to imitate with just 5 demonstrations of all tasks (Lunar Lander takes 10 demos), demonstrating sample-efficiency
1
2020-09-28
Hopper-BC-GAIL-steps
Learning to imitate the MuJoCo Hopper-v2 control task from demos of the task. The goal is to make a 2D one-legged robot hop forward asap, achieving a score of at least 3000. Imitation roughly means to match the learned model's score (mean, std) with the expert's score (mean, std). Here, we compare the performance of Behavioral Cloning (BC) and Generative Adversarial Imitation Learning (GAIL) on learning the task. GAIL learns to imitate with just 5 demonstrations of the task, demonstrating sample-efficiency.
1
2020-09-28
Pendulum-BC-GAIL-steps
Learning to imitate the OpenAI Gym Pendulum-v0 control task from demonstrations of the task. The goal is to balance an inverted pendulum upright, achieving a score of at least -200. Imitation is defined as matching the learned model's score (mean, std) with the expert's score (mean, std). Here, we compare the performance of Behavioral Cloning (BC) and Generative Adversarial Imitation Learning (GAIL) on the task. GAIL learns to imitate with just 5 demonstrations of the task, demonstrating sample-efficiency.
1
2020-09-28
0
2020-10-03
0
2020-10-03
0
2020-09-30
0
2020-09-30
0
2020-09-30
0
2020-09-29
0
2020-09-29
0
2020-09-29
0
2020-09-28
0
2020-09-28
0
2020-09-28
0
2020-09-28
0
2020-09-28
0
2020-09-28