All Intervention RL Experiments
Created on October 17|Last edited on November 28
Comment
Methodology
A2C Results
PPO Results
Mountain Car Results
Hyperparameter Sweep: Solve the Environment
Hyperparameter Sweep: Solve the Environment (2; Fewer Parameters)
Hyperparameter Sweep: Solve the Environment (3; Larger Catastrophe Zone)
Hyperparameter Sweep: Step 64; Batch Size 256
Hyperparameter Sweep: Step 16; Batch Size 32
Hyperparameter Sweep: Alpha, Beta Values
Hyperparameter Sweep: Blocker Type
Hyperparameter Sweep: Alpha, Beta Values (Updated)
Hyperparameter Sweep: New Oversight Phase, Test Bonuses
Hyperparameter Sweep: New Oversight Phase, Test Bonuses
Hyperparameter Sweep: New Oversight Phase, Test Bonuses
Intervention, Catastrophe, HIRL
Intervention
0
Catastrophe
0
HIRL
0
Expert
0
Run set 5
0
Lunar Lander Results
Hyperparameter Sweep: None Case
Hyperparameter Sweep: Expert Case
Hyperparameter Sweep: HIRL Case
Hyperparameter Sweep: Intervention Sweep
Hyperparameter Sweep: HIRL Case (Shorter Oversight Phase)
Hyperparameter Sweep: Comparing Intervention over Shorter Oversight Phase
Hyperparameter Sweep: Comparing Intervention over Shorter Oversight Phase
HIRL
0
Expert
0
Run set 4
0
Run set 4
0
Add a comment