Complete Mountain Car Offline Experiments
Created on March 4|Last edited on April 24
Comment
Experiment 0: Online Training with Interventions; Testing Penalty Values
Experiment 1: Online Clean Training + Clean Data Collection
Experiment 2: Online Training with Interventions + Data Collection without Interventions
Experiment 3: Online Training with Interventions + Data Collection with Interventions
Experiment 4: Re-Running Online Training with Interventions + Data Collection without Interventions
Experiment 5: Online Training without Interventions + Data Collection with Interventions
Experiment 8: Updated Experiment (3) with Additional Duplicate Catastrophe Transitions
Experiment 9: Updated Experiment (3) with Additional Diverse Catastrophe Transitions
Reminder on Experiment (3): Online Training with Interventions & Data Collection with Interventions + Additional Catastrophe Data Collection.
Experiment 9C: Evaluation Penalty set to -4 (1000 Episodes, 20K Epochs)
Mean and Standard Deviation of Max Eval Reward over 5 Seeds:
- With Human Actions: -115.2 ± 37.7
- Without Human Actions: -111.6 ± 25.9
Mean and Standard Deviation of Entropy of Q-Values on Catastrophic States:
- With Human Actions: 0.816 ± 0.0502
- Without Human Actions: 1.026 ± 0.0187
Run set
0
Experiment 9D: Evaluation Penalty set to -12 (1000 Episodes, 20K Epochs)
Mean and Standard Deviation of Max Eval Reward over 5 Seeds:
- With Human Actions: -114.4 ± 12.7
- Without Human Actions: -111.6 ± 25.9
Run set
0
Experiment 9E: Re-Running Experiment 9D with New Data (1000 Episodes, 20K Epochs)
Mean and Standard Deviation of Max Eval Reward over 5 Seeds:
- With Human Actions: -106.2 ± 14.3
- Without Human Actions: -112.0 ± 16.4
Run set
0
Experiment 9F: Re-Running Experiment 9D with New Data (2000 Episodes, 40K Epochs)
Run set
0
Add a comment