Data Science Experiments Management with Weights & Biases

We were thrilled when we came across this report about how the team at BroutonLab uses W&B so we're sharing it with you!. Made by BroutonLab team using W&B
BroutonLab team


Probably every data scientist has come across this situation - you create many different models to experiment with different parameters or even the entire architecture. You also want to experiment with the choice of the optimizer, learning rate, number of epochs, and so. Thus, in fact, you will have many different experiments and it will become more and more difficult to structure the results. In this article, we'll show you how to properly and conveniently manage and log your ML and DL experiments.
Today there are many tools that allow you to conveniently manage your experiments, such as: Weights & Biases, MlFlow, Neptune, and others.
MlFlow is an open source platform for managing the machine learning lifecycle. This platform is great for single use, but not very suitable for use in large teams, or if you have a large number of experiments.
Neptune is a lightweight run management tool that helps you keep track of your machine learning runs. It offers 3 types of subscriptions, 2 of which are paid. If you are going to use this service individually, then you will get free access.
Comet is a machine learning platform for tracking, comparing, explaining and optimizing experiments and models. It also provides different subscriptions, but there is a limit on the maximum number of team members equal to 5.
We will show you how to effectively log experiments using one of these platforms, namely Weights & Biases.

Weights & Biases overview

W&B is a platform that helps data scientists track their models, datasets, system information and more. With a few lines of code, you can start tracking everything about these features. It's free for personal use. Team use is normally a paid utility, but teams for academic purposes are free. You can use W&B with your favourite framework, like TensorFlow, Keras, PyTorch, Sklearn, fastai and many others.
All tracking information is sent to a dedicated project page on the W&B UI, where you can open high quality visualizations, aggregate information and compare models or parameters. One of the advantages of remotely storing the experiment’s information is that it is easy to collaborate on the same project and share the results with your teammates.
W&B provides 4 useful tools:
  1. Dashboard: Experiment tracking
  2. Artifacts: Dataset versioning, model versioning
  3. Sweeps: Hyperparameter optimization
  4. Reports: Save and share reproducible findings
Later in this tutorial, we will go over all of these utilities.
Figure 1. W&B features. Source: Weights & Biases docs.


Try it on Google Colaboratory →
To start, we should create a free account on the W&B website. Then let’s create a Jupyter-notebook with a simple Keras classifier model.
!pop install wandb -qimport wandb!wandb login
Now let’s create a new project in W&B and set a config with hyperparameters for the first experiment.
project_name = 'first_steps'group_name = 'cnn'experiment_name = '2_conv'wandb.init( project=project_name, group=group_name, name=experiment_name, config={ "conv_1": 32, "activation_1": "relu", "kernel_size": (3, 3), "pool_size": (2, 2), "dropout": 0.3, "conv_2": 64, "activation_out": "softmax", "optimizer": "adam", "loss": "sparse_categorical_crossentropy", "metric": "accuracy", "epoch": 6, "batch_size": 32 })config = wandb.config
As you can see, config is a dictionary with hyperparameters. You can also load config files in .yaml format. wandb.init creates a new run in W&B and launches a background process to sync data.
The next step is loading data and defining a simple CNN model.
from wandb.keras import WandbCallbackmnist = tf.keras.datasets.mnist(x_train, y_train), (x_test, y_test) = mnist.load_data()our_model = cnn_mnist()class_names = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], y_train, epochs=config.epoch, batch_size=config.batch_size, validation_data=(x_test, y_test), callbacks=[WandbCallback(data_type="image", labels=class_names)])wandb.finish()
We used the Keras callback to automatically save all the metrics and the loss values tracked in The WandCallback() class supports a number of options, like data_type, labels, etc.


Now we can look at the results. The run we have executed is now shown on the left side, in our project, with the group and experiment names we listed. We have access to a lot of information that W&B has automatically recorded.
We have several sections like:
All of these logs and files are needed to recreate an experiment, and you can now automate the creation of such logs and files to save time. You can view our dashboard for this project here.


Sweeps is a tool for hyperparameter and model optimization that gives you powerful levers to configure your sweeps exactly how you want them, with just a few lines of code. If you want to learn more about hyperparameters optimization techniques, you can check out our article Efficient Hyperparameter Optimization with Optuna Framework. It goes into more detail about optimization mechanisms.
First, you should define a config with hyperparameters which you are going to optimize. Also, you should choose a hyperparameter optimization strategy - random search or grid search - then choose a metric, which you are going to optimize.
sweep_config = { 'method': 'random', #grid, random 'metric': { 'name': 'accuracy', 'goal': 'maximize' }, 'parameters': { 'epoch': { 'values': [5, 10] }, 'dropout': { 'values': [0.3, 0.4, 0.5] }, 'conv_1': { 'values': [16, 32, 64] }, 'conv_2': { 'values': [16, 32, 64] }, 'optimizer': { 'values': ['adam', 'nadam', 'sgd', 'rmsprop'] }, 'activation_1': { 'values': ['relu', 'elu', 'selu','sigmoid'] }, 'kernel_size': { 'values': [(3, 3), (5, 5), (7, 7)] } }}
Then let’s create a sweep and define a train function. The sweep calls this function with each set of hyperparameters.
sweep_id = wandb.sweep(sweep_config, entity=user_name, project="first_steps")def train(): # Default values for hyperparameters we're going to sweep over config_defaults = { "conv_1": 32,a "activation_1": "relu", "kernel_size": (3, 3), "pool_size": (2, 2), "dropout": 0.1, "conv_2": 64, "activation_out": "softmax", "optimizer": "adam", "loss": "sparse_categorical_crossentropy", "metric": "accuracy", "epoch": 6, "batch_size": 32 } # Initialize a new wandb run wandb.init(config=config_defaults, group='first_sweeps') # config is a variable that holds and saves hyperparameters and inputs config = wandb.config model = cnn_mnist(config=config), y_train, epochs=config.epoch, batch_size=config.batch_size, validation_data=(x_test, y_test), callbacks=[wandb.keras.WandbCallback()])wandb.agent(sweep_id,train)
We got the folowing output:
Let’s look at the results of each sweep and choose the most appropriate hyperparameters for our model. You can see how the accuracy changes depending on the set of parameters we are interested in. In addition to these charts, you can build the charts that are most important to you. For example, you can also log the predictions of the model and thus find examples on which the model makes mistakes most often.


In addition to Sweeps and the Dashboard, W&B also provides a useful utility called Artifacts that allows you to log your data and models. In this context, Artifacts are produced objects - the outputs of processes - datasets and models. We'll show you how to use Artifacts, using dataset logging from our previous example.
Let's first load a raw dataset, then create a new Artifact.
def load_and_log(): with wandb.init(project=project_name, job_type="load-data") as run: datasets = load_data() names = ["training", "validation", "test"] # Artifact raw_data = wandb.Artifact( "mnist-raw", type="dataset", description="Raw MNIST dataset, splitted", metadata={"source": "keras.datasets.mnist", "train_data": len(datasets[0].x), "valid_data": len(datasets[1].x), "test_daata": len(datasets[2].x)}) for name, data in zip(names, datasets): # Save our datasets with raw_data.new_file(name + ".npz", mode="wb") as file: np.savez(file, x=data.x, y=data.y) #save Artifact run.log_artifact(raw_data)load_and_log()
Here are some good practices for Artifact management:
All these practices will help you and your teammates properly organize your pipeline structure.
If we go to the run page and the Artifacts tab, we will see the following:
"mnist-raw" is an output Artifact that contains:
Now let’s add a new Artifact, which will describe data preprocessing:
def preprocess_and_log(preprocess_steps): with wandb.init(project=project_name, job_type="data_preprocessing", name="preprocess_simple") as run: processed_data = wandb.Artifact( "mnist-preprocessed", type="dataset", description="Preprocessed MNIST dataset", metadata=preprocess_steps) # which Artifact we will use raw_data_artifact = run.use_artifact('mnist-raw:latest') # download Artifact raw_dataset = for split in ["training", "validation", "test"]: datafile = split + ".npz" data = np.load(os.path.join(raw_dataset, datafile)) raw_split = Dataset(x=data["x"], y=data["y"]) processed_dataset = preprocess_dataset(raw_split, **preprocess_steps) with processed_data.new_file(split + ".npz", mode="wb") as file: np.savez(file, x=processed_dataset.x, y=processed_dataset.y) run.log_artifact(processed_data)
First, we create a new Artifact and name it. Then we choose the Artifact, which we will use, mnist-raw:latest. After that, preprocess_dataset is applied for each data split. Finally the results are saved in the new Artifact. So let's run it.
steps = {"normalize": True, "expand_dims": True, "to_categorical" : True}preprocess_and_log(steps)
As you can see we now have 2 Artifacts: "mnist-raw" and "mnist-prepocessed". Graph view presents a user-friendly interface of your pipeline steps: rectangles are input/output Artifacts, and circles are the processes between Artifacts. With the help of a Graph view, you can easily track how your pipeline has changed in the course of work.


You can easily organize visualizations, describe your findings, and share updates with your team using Reports. In the report, you can indicate whatever you think is necessary, various graphics, dependencies, plans for the future. You can edit your reports with Latex or Markdown.
It's very simple to create a new report: go to your workspace in and click Create Report. You can create a report for one project or for one group, or you can create a cross-project report to compare runs from two projects. W&B provides a few templates, or you can create your custom blank report. This post, which you are reading, is an example of a report.
Figure 3. Report templates. Source: Weights & Biases docs.
Here is an example of a snapshot report for this project.
Figure 4. Report example.
Now we can share the results with our team. The use of Reports helped us to summarize the information received about today's experiments. In the future, we will not need to separately search for the results of each experiment, it will be enough to open the corresponding Report.


In this tutorial, we have shown how you can effectively manage and log your experiments with Weights & Biases. We have reviewed the main tools of this platform. For an advanced study you can refer to the official documentation. We hope this guide will be useful for your future work.
This article was independently produced by BroutonLab.