Skip to main content

Code Versioning for Multiple Files

A draft proposal for logging and comparing code beyond the single-file Code Tab
Created on September 11|Last edited on September 13

Goal: Log & version code from multiple files to W&B

The current Code Tab feature only allows us to version one script (typically the main training script). We would like to easily log, view, and especially compare other code outside of the main script (e.g. model definitions or any meaningful code which lives in modules outside the main training script). One solution relies on Artifacts.

Logging code as an Artifact

Let's say my repo contains, at the root, the main training script main.py and a training submodule, which includes various model definitions and more detailed training logic: models.py for the base network, dqn.py for the DQN algorithm for reinforcement learning, ppo.py for PPO, and so on. When I load these modules for use in my main training script, I can save the corresponding code files as W&B Artifacts.

Example repo root

  • /main.py
  • /training/models.py
  • /training/dqn.py
  • /training/ppo.py

Example code

from training.models import SafeLifePolicyNetwork
from training.ppo import PPO

model = SafeLifePolicyNetwork()
algo = PPO(model,
             training_envs=training_envs,
             testing_envs=testing_envs,
             data_logger=data_logger)

# log PPO code to W&B
# create an artifact named "ppo" of type "rl_algo_code"
rl_algo = wandb.Artifact("ppo", type="rl_algo_code")
# add the corresponding file via a relative path from current working directory
rl_algo.add_file("./training/ppo.py")
# log the Artifact to W&B
wandb.run.log_artifact(rl_algo)

# [ skipping lots of other code ]

# when using DQN instead
from training.dqn import dqn
rl_algo = wandb.Artifact("dqn", type="rl_algo_code")
            rl_algo.add_file("./training/dqn.py")
            wandb.run.log_artifact(rl_algo)

# for model code generally
model_code = wandb.Artifact("model_code", type="model_code")
model_code.add_file("./training/models.py", name="models.py")
wandb.run.log_artifact(model_code)

What does this do?

Artifacts will be logged to W&B alongside my experiment runs and versioned.

  • If there are changes to a file, a new version will be added and saved with that particular run.
  • If there are no changes in the code file (via MD5 hash), a new copy will not be uploaded and the version will not be incremented.
  • For each artifact, I can compare across versions and know precisely what, if anything, changed in which run.

What does it look like?

Here is a super early preview of two versions of models.py : v0 versus v4 latest (also shown below). It's raw now, but we will make this a nice GitHub-style diff! Screen Shot 2020-09-13 at 4.24.06 PM.png

Notes

Artifact types

I've chosen to differentiate the code by logic type (models vs RL algorithms) for convenience and precision because I expect these to change at different rates. One could also keep all code artifacts as type=code.

Naming artifacts

Currently we only version artifacts of the same name AND type. The optional name= argument to add_file just gives the Artifact file a shorter display path.




Run set
13