Skip to main content

All Intervention RL Experiments

Created on October 17|Last edited on November 28

Methodology

A2C Results

PPO Results

Mountain Car Results

Hyperparameter Sweep: Solve the Environment

Hyperparameter Sweep: Solve the Environment (2; Fewer Parameters)

Hyperparameter Sweep: Solve the Environment (3; Larger Catastrophe Zone)

Hyperparameter Sweep: Step 64; Batch Size 256

Hyperparameter Sweep: Step 16; Batch Size 32

Hyperparameter Sweep: Alpha, Beta Values

Hyperparameter Sweep: Blocker Type

Hyperparameter Sweep: Alpha, Beta Values (Updated)

Hyperparameter Sweep: New Oversight Phase, Test Bonuses

Hyperparameter Sweep: New Oversight Phase, Test Bonuses

Hyperparameter Sweep: New Oversight Phase, Test Bonuses

Intervention, Catastrophe, HIRL


Intervention
0
Catastrophe
0
HIRL
0
Expert
0
Run set 5
0


Lunar Lander Results

Hyperparameter Sweep: None Case

Hyperparameter Sweep: Expert Case

Hyperparameter Sweep: HIRL Case

Hyperparameter Sweep: Intervention Sweep

Hyperparameter Sweep: HIRL Case (Shorter Oversight Phase)

Hyperparameter Sweep: Comparing Intervention over Shorter Oversight Phase

Hyperparameter Sweep: Comparing Intervention over Shorter Oversight Phase


HIRL
0
Expert
0
Run set 4
0
Run set 4
0