Visualizing Prodigy Datasets Using W&B Tables

Use the W&B/Prodigy integration to upload your Prodigy annotated datasets to W&B for easier visualization. Made by Kevin Shen using Weights & Biases
Kevin Shen

What is Prodigy?

Prodigy is an annotation tool made by Explosion for creating training and evaluation data for machine learning models, error analysis, data inspection & cleaning.
The W&B x Prodigy integration (docs here) adds a simple and easy-to-use functionality to upload your Prodigy annotated dataset directly to W&B for visualization. This can be done in a single line and will convert the entire dataset to W&B Table format.

Usage

Requirements

For more information on Prodigy, installation & setup, please refer to the Prodigy documentation.
Apart from Prodigy, this integration also uses the following libraries:

Code

To use the integration, simply call upload_dataset and pass in the name of the annotated dataset that's in the local Prodigy database.
from wandb.integration.prodigy import upload_datasetupload_dataset("name_of_dataset_in_database")
W&B will automatically try to convert certain images and text fields, such as image URLs and named entity spans, to actual images and spaCy HTML objects. Extra columns may be added to the resulting table to include these visualizations.

Examples

Here are a two examples of Prodigy annotated datasets uploaded to W&B. All data fields, including Prodigy metadata fields such as input hash and task hash, are preserved.

Text with Named Entity Recognition

The spans_visual column added by the integration contains the result of Spacy's NER visualization functionality automatically being applied to all items in the corresponding spans field.

Images

The following table shows a dataset containing images as base64 data URIs.
The integration will add a new image_visual column which contains the result of Spacy's NER visualization functionality automatically being applied to all image fields.
The integration is able to create image visuals out of file paths, URL links, bucket links, and base64-encoded data URIs.

Summary

Hopefully, this simple walk through gives you a nice starting point for visualizing Prodigy datasets using W&B. In the future, we plan on adding more visual functionalities such as converting audio, bounding boxes, masks, as well as expanding the number of fields that can be converted to HTMLs and images. We'd love to see any experiments you're excited about or hear any feedback you have. Thanks!