Skip to main content

How do you reproduce experiments?

Created on April 26|Last edited on April 26
How do you reproduce experiments? For example, if you have a figure or a number for an experiment, what is your workflow to reproduce them (e.g. setup environment dependencies, dataset versioning, etc)?

My current workflow with reinforcement learning experiments uses wandb to track experiments, which records the requirements.txt, git repo and state, source code, commands that was use to run the experiments. So if I want to reproduce this experiment (cleanrl/cleanrl.benchmark/runs/2johl5ne), I do the following:
pip install cleanrl
python -m cleanrl.utils.reproduce --run cleanrl/cleanrl.benchmark/runs/2johl5ne
And it outputs a fairly generic command:
# run the following
python3 -m venv venv
source venv/bin/activate
pip install -r https://api.wandb.ai/files/cleanrl/cleanrl.benchmark/2johl5ne/requirements.txt
curl -OL https://api.wandb.ai/files/cleanrl/cleanrl.benchmark/2johl5ne/code/cleanrl/ppo_atari_visual.py
python ppo_atari_visual.py --gym-id QbertNoFrameskip-v4 --total-timesteps 10000000 --wandb-project-name cleanrl.benchmark --prod-mode --capture-video --seed 2

Although this command could reproduce experiments to a reasonable standards, there are some shortfalls:
  • Non-python dependencies are not logged such as CUDA or Mujoco simulator installation. So the command above can't reproduce the Mujoco experiments 
  • Data versioning is not used. So the command above perhaps could not reproduce NLP projects especially those with local dataset.
What are your workflow and best practices?