Complete Mountain Car Offline Experiments

Created on March 4|Last edited on April 24
Comment
﻿
Experiment 0: Online Training with Interventions; Testing Penalty Values
Experiment 1: Online Clean Training + Clean Data Collection
Experiment 2: Online Training with Interventions + Data Collection without Interventions
Experiment 3: Online Training with Interventions + Data Collection with Interventions
Experiment 4: Re-Running Online Training with Interventions + Data Collection without Interventions
Experiment 5: Online Training without Interventions + Data Collection with Interventions
Experiment 8: Updated Experiment (3) with Additional Duplicate Catastrophe Transitions
Experiment 9: Updated Experiment (3) with Additional Diverse Catastrophe TransitionsReminder on Experiment (3): Online Training with Interventions & Data Collection with Interventions + Additional Catastrophe Data Collection.
Experiment 9C: Evaluation Penalty set to -4 (1000 Episodes, 20K Epochs)Mean and Standard Deviation of Max Eval Reward over 5 Seeds:
With Human Actions: -115.2 ± 37.7
Without Human Actions: -111.6 ± 25.9
Mean and Standard Deviation of Entropy of Q-Values on Catastrophic States:
With Human Actions: 0.816 ± 0.0502
Without Human Actions: 1.026 ± 0.0187
﻿
Run set0
﻿
Experiment 9D: Evaluation Penalty set to -12 (1000 Episodes, 20K Epochs)Mean and Standard Deviation of Max Eval Reward over 5 Seeds:
With Human Actions: -114.4 ± 12.7
Without Human Actions: -111.6 ± 25.9
﻿
Run set0
﻿
Experiment 9E: Re-Running Experiment 9D with New Data (1000 Episodes, 20K Epochs)Mean and Standard Deviation of Max Eval Reward over 5 Seeds:
With Human Actions: -106.2 ± 14.3
Without Human Actions: -112.0 ± 16.4
﻿
Run set0
﻿
Experiment 9F: Re-Running Experiment 9D with New Data (2000 Episodes, 40K Epochs)﻿
﻿
Run set0
﻿
﻿
Add a comment