How Weights and Biases Can Help with Audits & Regulatory Guidelines

Use W&B Artifacts to make your teams' models auditable. Made by Aman Arora using Weights & Biases
Aman Arora
Before joining Weights & Biases, I was working for a medical company whose product was being used in major hospitals across Australia. And here's this thing about working for medical startups in AI: they are really big on compliance and auditing!
In our case, a deep learning model was responsible for detecting findings in X-rays or CT-scans. In other words: AI was directly affecting human lives, so things were pretty strict when it came to model changes. Every change had to be approved by regulatory agencies and every process had to be recorded. All processes (such as data collection, data cleaning, model training, model evaluation etc.) had to be fault proof.

Before W&B integration

Before integrating with Weights & Biases, things were really pretty messy. A large excel sheet had to be maintained for logging in every model change. This excel sheet looked something like:
Table-1: Excel sheet to log model changes
And since the company had three different models (one for X-rays, another for CT-scans and final one for MRIs) that solved different problems for more than 50 different clients, you can imagine how messy and massive these excel sheets became!
Past that, Australian medical regulatory guidelines are pretty strict. That meant the medical company I worked for was part of an audit almost every year. This meant that every process was cross-verified. This included data collection, data cleaning, model training, model evaluation, data leakage etc.
One question I remember being asked by the auditors that I didn't have an answer for at the time was:
How do we know that the model being shared with the clients is the same model that you trained 3 months ago? What if the file that you've shared is the wrong one or that it got overwritten by someone else in the team?
Let me try and visualize this with the help of an example:
Figure-1: Simple real world use case
Let's just say I am a Machine Learning Engineer at some medical company. I train a model ResNet-34 that now is production ready and needs to be shared with 3 clients - Clients A, B & C.
First, we maintain a log of this model and all the processes in an excel sheet similar to table-1. Now, we need to share the final version with the clients.
What are the typical ways for me to do this?
  1. Save the model on Google Drive and share a link with the three clients?
  2. Save the model on private cloud such as AWS/GCP and let the clients download the model?
  3. Share a copy of the model with each of the three clients individually.
For sensitive data and models, neither of the three cases would work. The question is why?
There is a chance of error in either of the three cases as there is a manual step involved. What if I upload the wrong model to Google Drive or private cloud? What if the model artifact that I wanted to upload was called checkpoint-100.pth.tar and instead I uploaded checkpoint-10.pth.tar?
Anything is possible! Especially if you are part of heavy audits and have to follow regulation and compliance practices. In such cases, your method needs to be fault proof.

After W&B integration

Weights and Biases Artifacts to the rescue! Now as part of your training, you could integrate with Weights and Biases to log runs, metrics and also save best models alongside their performance scores!
Wait, but how does this help?
Figure-2: Using W&B as the central place for sharing model artifacts
Instead of directly sharing the models with your clients, one intermediate step could be to save the model artifacts to Weights and Biases!

What are W&B Artifacts?

From our docs:
W&B Artifacts was designed to make it effortless to version your datasets and models, regardless of whether you want to store your files with W&B or whether you already have a bucket you want W&B to track. Once you've tracked your datasets or model files, W&B will automatically log each and every modification, giving you a complete and auditable history of changes to your files. This lets you focus on the fun and important parts of evolving your datasets and training your models, while W&B handles the otherwise tedious process of tracking all the details.
Great! Weights and Biases can take care of versioning and storage for us. But what's the benefit?
By using Weights and Biases, we can ensure that each client get's the correct model version. Since there is no human in the loop, this minimizes risk. Also, for companies that have to undergo regular audits and follow strict compliance guidelines, just sharing W&B run_id should do the trick as everything can then be accessed using Weights and Biases including configs, model and data artifacts.
It's really easy to log wandb artifacts using 4 simple lines of code:
wandb.init()artifact = wandb.Artifact(, type='model')artifact.add_file()wandb.run.log_artifact(artifact)

Why use Weights and Biases?

  1. The process is automated: Just with few lines of code, all models and datasets can be properly versioned and stored using W&B.
  2. Minimizes chance of Error: Since there is no human in the loop, you can be sure that each client gets the same copy of the model that was saved during training (automatically).
  3. Everything is reproducible/traceable: Integrating Weights and Biases in your pipelines ensures that all experiments can be reproduced. Thus, if your company is ever part of an audit, all you need to do is share the experiment ID, which will have the logs of the training script used at the time, model version served in production, model's performance scores and more! See this Dashboard for an example.
  4. Everything is one place: Once you integrate with Weights and Biases, all your logs, source code files, metric logs, model versions, dataset versions are in one place! This makes it really easy to have a high level view on everything even after few months/years! No longer do you need to maintain Excel spreadsheets/notebooks etc.
  5. Host on your private cloud: If your data is sensitive, then Weights and Biases also provides the option to run everything locally and store model and data artifacts in your own personal buckets on AWS/GCP or any other cloud provider. Refer here for more information.
  6. Track everything: Weights and Biases also has the capability to track files that have been saved outside the W&B system! These are called reference artifacts. Reference artifacts are great for sensitive data that can be stored in your own privately managed S3 bucket, GCS bucket, HTTP file server, or even an NFS share! For an example of tracking reference files in GCP, with code and screenshots, follow our Guide to Reference Artifacts.

A simple framework to structure your future experiments

As part of this section, I will run through a simple example that showcases how to store model artifacts on AWS, and use Weights and Biases for model and dataset versioning for regulatory purpose.
(All code below can be accessed in this google colab.)
The notebook kicks off a training run, that should automatically log model checkpoints as W&B artifacts for you. By default the project name should be called "Artifacts". You can update this by updating the PROJECT key-value in Config Dict.
So, from this point on, we are always in control. This is what this training process looks like:
Figure-3: How W&B makes it super easy for users to reproduce experiments
As part of this minimal training example with W&B integration, this is what happens:
Figure-4: Source code files
Figure-5: W&B model artifacts
Figure-6: W&B simple dashboard
As can be seen in the examples, Weights & Biases can really make it easy for users to reproduce experiments, log model configs and weights, save source code files and a single Weights and Biases run_id can be used as a central place to access all things at once!
No longer do we need excel sheets similar to table-1 since everything is already stored as metadata in W&B. We have metric information, performance scores, model weights, dataset versions, date and time of training process, user who triggered training process and also all source code files.
Thus, if now an auditor asks "How did the training process look for a model trained 6 moths ago?". We have a change log thanks to Weights & Biases that has all the relevant information!

But what about sharing model artifacts with clients?

Everything that we say above still doesn't solve the problem we saw in Figure-1. How do we now share a model artifact that's stored in W&B with the clients?
Figure-7: Sharing model artifact with clients without any human interaction in the loop
The process has been explained in Figure-7 above. We extract the model artifact and also store them in a private AWS/GCP bucket programatically. We could then provide the clients with access to this bucket and let them download the model weights from there.
A script to programmatically download the model weights from W&B and upload to AWS has been provided here.
For this script to work, please make sure:
  1. You have an active account on AWS
  2. AWSCLI has been installed and configured. If not, follow instructions as mentioned here.
Here's what this function Upload Artifact to S3 from the Google Colab does:
Figure-8: Programmatically upload model artifact to AWS
Download model artifact from W&B: First, we provide the project and artifact name in Config. Based on this, we get the artifact information from Weights and Biases including Artifact's digest. The artifact's digest is a checksum of its contents. If an artifact has the same digest as any other file, it means that both the files are the same.
  1. Check if file exists on AWS: Next, we check if a file with the same name exists on AWS. If it does, we get the file's Digest information that has been stored under MetaData on S3. We compare this Digest with Artifact's digest value, and if they are different, we upload W&B artifact to S3. Otherwise, we do nothing.
  2. Upload Model Artifact along with digest to S3: If file does not exist on AWS, or a new version is available, in that case also we upload the file to s3 and add digest as MetaData.
Figure-9: Add Metadata when uploading W&B Artifact to S3.
By using code and doing everything programmatically, we have completely eliminated any human interaction from the loop. Thus making this process fault proof and great for compliance and audit purposes!

Conclusion

In this report, I shared with you my experience on how things were before and after integrating with Weights & Biases at my previous company.
As can be seen, integrating with Weights & Biases really makes processes much smoother and fault proof! We also saw an example of how one could directly share model artifacts straight from Weights & Biases with clients without any human interaction in the loop!
Finally, I also shared code in this repo so you could apply the same process at your own respective companies. If you can any questions please feel free to reach out to me at aman@wandb.com.