Reproducible spaCy NLP Experiments with Weights & Biases
How to use Weights & Biases and spaCy to train custom, reproducible NLP pipelines
Created on July 19|Last edited on April 26
Comment
Introduction
This tutorial will show how to add Weights & Biases to any spaCy NLP project to track your experiments, save model checkpoints, and version your datasets. First, we'll discuss how to integrate W&B into your spaCy projects, and then we'll see how DaCy: An efficient NLP Pipeline for Danish makes use of Weights & Biases to achieve State-of-the-Art performance in a bunch of NLP tasks on the Danish language 🇩🇰.
Here's a quick intro to the two libraries before we get started:

spaCy can serve many of your Natural Language Processing (NLP) needs out-of-the-box. This includes Named Entity Recognition (NER), Part of Speech tagging, text classification and more. Even better, these components are all customizable, extendable and composable.

Weights & Biases makes running collaborative machine learning projects a breeze. You can focus on what you're trying to experiment with, and W&B will take on the burden of keeping track of everything. If you want to review a loss plot, download the latest production model, or see which configurations produced a specific model, W&B is your friend. There are also a bunch of features to help you and your team collaborate, like having a shared dashboard and sharing interactive reports.

Integrating Weights & Biases into your spaCy project
To add Weights & Biases to your project, you need to add a few lines to your project's config .cfg file. For more information, visit the spaCy integration page in our docs.
Add experiment tracking
First, if you want to add experiment tracking so that you can compare metrics across different runs, add these few lines to your config:
[training.logger]@loggers = "spacy.WandbLogger.v2"project_name = "your_project_name"remove_config_values = []
your_project_name will be the project name on your W&B Dashboard. Along with training metrics, your config will also be uploaded. You can pass in a list of config values to remove using remove_config_values. Below you can see a bunch of training plots and metrics produced from a spaCy project.
Run set
20
Add model checkpointing
If you'd like your model checkpoints to be stored using W&B Artifacts, add the parameter model_log_interval under [training.logger] to tell spaCy to log your model every N steps.
model_log_interval = 1000
Add dataset versioning
Finally, to upload your dataset to W&B and track versions of it, you can add another parameter log_dataset_dir under [training.logger].
log_dataset_dir = "./assets"
Add all W&B features
Adding only 6 lines to your config, you now have an experiment tracking, model checkpointing and dataset versioning. Here it is in one place:
[training.logger]@loggers = "spacy.WandbLogger.v2"project_name = "your_project_name"remove_config_values = []log_dataset_dir = "./assets"model_log_interval = 1000
Train a spaCy model
Once you've added these config changes, train a model with spaCy by running
python -m spacy train <your_config_path/config.cfg> --output training/
Pass Weights & Biases API key when you first train a model
Go to https://wandb.ai to signup for a free account, and then when you begin training, it will prompt you to go to https://wandb.ai/authorize to get your API key. Paste it into your command line, and you're all set up!
Visualize your Named Entity Recognition Data
Along with experiment tracking and managing your data pipeline, Weights & Biases also provides a useful integration with displacy to help spaCy users view their data and predictions. To log a wandb spaCy plot directly to a wandb.Table, you can use wandb.plots.NER(docs=document) where document is the spaCy document with annotated named entities.
Run set
5
Here you can see a labeled NER dataset of headlines. Here's the code that creates the table above using wandb.Table and wandb.plot.NER.
import wandbimport jsonwandb.init(project='wandb_spacy_integration', entity='wandb')nlp = spacy.load("en_core_web_sm")data = json.load(open('./annotated_news_headlines-ORG-PERSON-LOCATION-ner.jsonl'))plots = []for example in data:doc = nlp(example['text'])plots.append([wandb.plots.NER(docs=doc)])table = wandb.Table(data=plots, columns=['displacy NER'])wandb.log({'spaCy NER table': table})wandb.finish()
Display model evaluation results using Tables
Not only can Tables be used for data visualization, it can also be used to evaluate the performance of your models.
By putting your model evaluation results in a wandb.Table you get an interactive table which you can use to further dive into your metrics. For example, below we can see the per-class metrics for NER to show our best performing classes.
Run set
2
Besides aggregate metrics, another great feature of Tables is seeing per annotation results. This helps you find the best and worst performing examples in your dataset. In the Table below, you can compare the annotated label with the predicted label and see metrics for each example.
Run set
1
Case Study - DaCy: An NLP Pipeline for Danish 🇩🇰
Many projects use spaCy and Weights & Biases together to track their NLP experiments. One such example is DaCy: An efficient NLP Pipeline for Danish which you can try out on Hugging Face Model Hub.
This great project by the team at Center for Humanities Computing Aarhus achieved State-of-the-Art performance in Named Entity Recognition (NER), Part of String tagging (POS) and dependency parsing for the Danish language 🇩🇰. Here's their paper if you'd like to check it out.
They evaluated all publicly available Danish Transformer models using spacy-transformers, the Hugging Face's transformers wrapper. These models were fine-tuned on the DaNE corpus, a named entity resource for Danish, and evaluated on NER, POS and Dependency Parsing. See the paper for further details.

The delicious DaCy logo
By including a few lines of code in their training configs, the DaCy team tracks experiments using Weights & Biases. Here's the dashboard showing all of the results and the runs that produced them.
Run set
20
By clicking the Run set below the panel above, you can see each individual run and its metrics. You can navigate to any run and view the configs that were set, the git state, system metrics (like GPU usage) and even the command that was run to start training.

The Overview of the run named kind-pond-29. The command that was run to start the run is highlighted.
Additionally, DaCy has made use of spaCy Projects to create reproducible model training pipelines. You can simply arrive to the repo and run spacy project run small and it'll train a collection of small Danish NLP models for you!
Although this is great, this doesn't help if you've changed your config file or dataset and you want to go back and rerun an experiment. Weights & Biases can help here.
After adding model checkpointing and dataset versioning, you can track everything you need to have a completely reproducible training pipeline. By enabling these features, you can easily download the latest model, reproduce an experiment or see which dataset version was used to train a model.
W&B will detect if your dataset has changed and only update the dataset version if it has. You can also use Artifact references if you don't want to upload your data.
💡

The Artifacts section - where you can view your datasets and model checkpoints

The run named playful-glitter-2 and the dataset and model checkpoints it produced.
Conclusion
Sometimes it feels like the Machine Learning community thinks that reproducible experiments are a myth. In this tutorial, you've seen with your own eyes how you can train models that you could easily reproduce using the Weights & Biases integration with spaCy.
By adding a few lines of code to your spaCy configuration, you can rest assured that each of your experiments, models and datasets are conveniently tracked. We've also seen how you can evaluate your models and visualize your NER datasets using W&B using Tables. We take a lot of pride in making sure our products play nicely with different frameworks and that it's easy to integrate with them so practitioners can focus on the problems they care about. Thanks for reading!
Read Next:
Named Entity Recognition with W&B and spaCy
Visualize and explore named entities with spaCy in W&B
Ines & Sofie — Building Industrial-Strength NLP Pipelines
Sofie and Ines walk us through how the new spaCy library helps build end to end SOTA natural language processing workflows.
Visualizing Prodigy Datasets Using W&B Tables
Use the W&B/Prodigy integration to upload your Prodigy annotated datasets to W&B for easier visualization
Hyperparameter Search With spaCy and Weights & Biases
In this article, we explore how to find the optimal hyperparameters for your spaCy project using Weights & Biases Sweeps to automate hyperparameter search.
Add a comment
Tags: Intermediate, NLP, NER, HuggingFace, spaCy, Experiment, Tutorial, W&B Meta, Artifacts, Panels, Plots, Sweeps, Tables
Iterate on AI agents and models faster. Try Weights & Biases today.