Skip to main content

Complete Mountain Car Offline Experiments

Created on March 4|Last edited on April 24

Experiment 0: Online Training with Interventions; Testing Penalty Values

Experiment 1: Online Clean Training + Clean Data Collection

Experiment 2: Online Training with Interventions + Data Collection without Interventions

Experiment 3: Online Training with Interventions + Data Collection with Interventions

Experiment 4: Re-Running Online Training with Interventions + Data Collection without Interventions

Experiment 5: Online Training without Interventions + Data Collection with Interventions

Experiment 8: Updated Experiment (3) with Additional Duplicate Catastrophe Transitions

Experiment 9: Updated Experiment (3) with Additional Diverse Catastrophe Transitions

Reminder on Experiment (3): Online Training with Interventions & Data Collection with Interventions + Additional Catastrophe Data Collection.

Experiment 9C: Evaluation Penalty set to -4 (1000 Episodes, 20K Epochs)

Mean and Standard Deviation of Max Eval Reward over 5 Seeds:
  • With Human Actions: -115.2 ± 37.7
  • Without Human Actions: -111.6 ± 25.9
Mean and Standard Deviation of Entropy of Q-Values on Catastrophic States:
  • With Human Actions: 0.816 ± 0.0502
  • Without Human Actions: 1.026 ± 0.0187

Run set
0


Experiment 9D: Evaluation Penalty set to -12 (1000 Episodes, 20K Epochs)

Mean and Standard Deviation of Max Eval Reward over 5 Seeds:
  • With Human Actions: -114.4 ± 12.7
  • Without Human Actions: -111.6 ± 25.9

Run set
0


Experiment 9E: Re-Running Experiment 9D with New Data (1000 Episodes, 20K Epochs)

Mean and Standard Deviation of Max Eval Reward over 5 Seeds:
  • With Human Actions: -106.2 ± 14.3
  • Without Human Actions: -112.0 ± 16.4

Run set
0


Experiment 9F: Re-Running Experiment 9D with New Data (2000 Episodes, 40K Epochs)



Run set
0