Reproducible spaCy NLP Experiments with Weights & Biases

How to use Weights & Biases and spaCy to train custom, reproducible NLP pipelines. Made by Scott Condron using Weights & Biases
Scott Condron


In this tutorial, we'll show how to add Weights & Biases to any spaCy NLP project to track your experiments, save model checkpoints, and version your datasets. First, we'll discuss how to integrate W&B into your spaCy projects and then we'll see how DaCy: An efficient NLP Pipeline for Danish makes use of Weights & Biases to achieve State-of-the-Art performance in a bunch of NLP tasks on the Danish language 🇩🇰.
Here's a quick intro to the two libraries before we get started:
spaCy can serve a lot of your Natural Language Processing (NLP) needs out-of-the-box. This includes Named Entity Recognition (NER), Part of Speech tagging, text classification and more. Even better, these components are all customizable, extendable and composable.
Weights & Biases makes running collaborative machine learning projects a breeze. You can focus on what you're trying to experiment with, and W&B will take on the burden of keeping track of everything. If you want to review a loss plot, download the latest model for production, or just see which configurations produced a certain model, W&B is your friend. There's also a bunch of features to help you and your team collaborate like having a shared dashboard and sharing interactive reports.

Integrating Weights & Biases into your spaCy project

To add Weights & Biases to your project, all you need to do is add a few lines to your project's config .cfg file. For more information, visit the spaCy integration page in our docs.

Add experiment tracking

First, if you want to add experiment tracking so that you can compare metrics across different runs, just add these few lines to your config:
[training.logger]@loggers = "spacy.WandbLogger.v2"project_name = "your_project_name"remove_config_values = []
your_project_name will be the project name on your W&B Dashboard. Along with training metrics, you config will also be uploaded. You can pass in a list of config values to remove using remove_config_values. Below you can see a bunch of training plots and metrics that were produced from a spaCy project.

Add model checkpointing

If you'd like your model checkpoints to be stored using W&B Artifacts, add an additional parameter model_log_interval under [training.logger] to tell spaCy to log your model every N steps.
model_log_interval = 1000

Add dataset versioning

Finally, to upload your dataset to W&B and track versions of it, you can add another parameter log_dataset_dir under [training.logger].
log_dataset_dir = "./assets"

Add all W&B features

Adding only 6 lines to your config, you now have an experiment tracking, model checkpointing and dataset versioning. Here it is in one place:
[training.logger]@loggers = "spacy.WandbLogger.v2"project_name = "your_project_name"remove_config_values = []log_dataset_dir = "./assets"model_log_interval = 1000

Train a spaCy model

Once you've added these config changes, train a model with spaCy by running
python -m spacy train --output training/

Pass Weights & Biases API key when you first train a model

Go to to signup for a free account, and then when you begin training it will prompt you to go to to get your API key. Paste it in to your command line and you're all set up!

Visualize your Named Entity Recognition Data

Along with experiment tracking and managing your data pipeline, Weights & Biases also provides a useful integration with displacy to help spaCy users view their data and predictions. To log a wandb spaCy plot directly to a wandb.Table, you can use wandb.plots.NER(docs=document) where document is the spaCy document with annotated named entities.
Here you can see a labeled NER dataset of headlines. Here's the code that creates the table above using wandb.Table and wandb.plot.NER.
import wandbimport jsonwandb.init(project='wandb_spacy_integration', entity='wandb')nlp = spacy.load("en_core_web_sm")data = json.load(open('./annotated_news_headlines-ORG-PERSON-LOCATION-ner.jsonl'))plots = []for example in data: doc = nlp(example['text']) plots.append([wandb.plots.NER(docs=doc)])table = wandb.Table(data=plots, columns=['displacy NER'])wandb.log({'spaCy NER table': table})wandb.finish()

Display model evaluation results using Tables

Not only can Tables be used for data visualization, it can also be used to evaluate the performance of your models.
By putting your model evaluation results in a wandb.Table you get an interactive table which you can use to further dive into your metrics. For example, below we can see the per class metrics for NER to show our best performing classes.
Besides aggregate metrics, another great feature of Tables is seeing per annotation results. This helps you find the best and worst performing examples in your dataset. In the Table below, you can compare the annotated label with the predicted label and see metrics for each example.

Case Study - DaCy: An NLP Pipeline for Danish 🇩🇰

Many projects use spaCy and Weights & Biases together to track their NLP experiments. One such example is DaCy: An efficient NLP Pipeline for Danish which you can try out on Hugging Face Model Hub.
This great project by the team at Center for Humanities Computing Aarhus achieved State-of-the-Art performance in Named Entity Recognition (NER), Part of String tagging (POS) and dependency parsing for the Danish language 🇩🇰. Here's their paper if you'd like to check it out.
They evaluated all publicly available Danish Transformer models using spacy-transformers, the Hugging Face's transformers wrapper. These models were fine-tuned on the DaNE corpus, a named entity resource for Danish, and evaluated on NER, POS and Dependency Parsing. See the paper for further details.
The delicious DaCy logo
By including a few lines of code in their training configs, the DaCy team tracks experiments using Weights & Biases. Here's the dashboard showing all of the results and the runs that produced them.
By clicking the Run set below the panel above, you can see each individual run and its metrics. You can navigate to any run and view the configs that were set, the git state, system metrics (like GPU usage) and even the command that was run to start training.
The Overview of the run named kind-pond-29. The command that was run to start the run is highlighted.
Additionally, DaCy has made use of spaCy Projects to create reproducible model training pipelines. You can simply arrive to the repo and run spacy project run small and it'll train a collection of small Danish NLP models for you!
Although this is great, this doesn't help if you've changed your config file or dataset and you want to go back and rerun an experiment. Weights & Biases can help here.
After adding model checkpointing and dataset versioning, you can track everything you need to have a completely reproducible training pipeline. By enabling these features, you can easily download the latest model, reproduce an experiment or see which dataset version was used to train a model.
W&B will detect if your dataset has changed and only update the dataset version if it has. You can also use Artifact references if you don't want to upload your data.
The Artifacts section - where you can view your datasets and model checkpoints
The run named playful-glitter-2 and the dataset and model checkpoints it produced.


Sometimes it feels like the Machine Learning community thinks that reproducible experiments are a myth. In this tutorial, you've seen with your own eyes how you can train models that you could easily reproduce using the Weights & Biases integration with spaCy.
By adding a few lines of code to your spaCy configuration, you can rest assured that each of your experiments, models and datasets are conveniently tracked. We've also seen how you can evaluate your models and visualize your NER datasets using W&B using Tables. We take a lot pride in making sure our products play nicely with different frameworks and that it's easy to integrate with them so practitioners can focus on the problems they care about. Thanks for reading!

Read Next:

Report Gallery