Extend a Preset: Histogram Bins

How to adapt a W&B Custom Chart preset for your use case. Made by Stacey Svetlichnaya using Weights & Biases
Stacey Svetlichnaya

Custom charts are fully editable

How do you modify one of the W&B Custom Charts for your particular project? In this report, I extend the Custom Histogram to

You can follow these steps to customize any of our presets (line plot, bar plot, scatter plot, histogram, PR curve, and ROC curve).

The default histogram appears on the left below (from Custom Histogram). On the right, I've edited the default histogram to use smaller bins and cut off at a count of 50 so I can see the fine detail. If you'd like to try this yourself in a Colab notebook, I recommend

Powerful interactions

In both charts, you can zoom, pan, and hover to see more information. Both chart show four different model variants (same validation data, different epoch count and numbers of training examples) on the same axes for easy comparison. You can use the "eye" icons to the left of the run names to toggle the display of individual runs on/off.

Section 5

What does this chart mean?

I finetune a CNN to predict 10 classes of living things: plants, birds, insects, etc. I want to see a frequency count of prediction confidence scores and see how they vary across classes and model variants. For example, is a model more confident on certain classes (histogram peaks at low and high scores) than others (flat even distribution across bins)? I vary NT, the number of training examples and E, the number of training epochs for each run. Both numbers are tiny for illustration purposes. When these are too small, the model gives very low confidence scores (<0.1). With increasing epochs and training examples, we start to see more high confidence scores and some intermediate scores for the model's prediction confidence (in this case, that the image shows a bird).

Default histograms

This chart lets you sort a list of values into bins by count or frequency of occurrence. Let's say I have a list of prediction confidence scores (scores) for model, and I want to see their distribution:

data = [[s] for s in scores]
table = wandb.Table(data=data, columns=["scores"])
wandb.log({'my_histogram': wandb.plot.histogram(table, "scores", title=None)})

Note that data is a list of lists, intended to support a 2D array of rows and columns.

Customization steps: Edit the Vega spec

Taking the first histogram in this report as an example, I want to make two changes:

  1. Hover over the top right corner of an existing chart and click on the "edit" pencil to open the custom chart modal. Here you can change the query fields, or how your data is loaded into the histogram, if needed.

Screen Shot 2020-11-13 at 7.51.08 AM.png

  1. Click "Edit" in the top left, next to the name of the W&B global preset you're currently using to open the interactive visualization editor. The Vega spec on the left is a full definition of the chart in the Vega visualization grammar. You can find lots of Vega tutorials and examples online, and it's very easy to tinker with small details in this json format. Screen Shot 2020-11-13 at 7.51.25 AM.png

  2. Iteratively make changes to the Vega spec and see their effect. If you're not sure how to make the changes, search for relevant Vega examples (e.g. I used this reference on binning in histograms. Our IDE is also very friendly to iterative development, and I've found Vega syntax to be reasonably intuitive. Here I change two lines:

Here is my full Vega spec:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "description": "A simple histogram",
  "data": {
    "name": "wandb"
  },
  "selection": {
    "grid": {
      "type": "interval", "bind": "scales"
    }
  },
  "title": "${field:title}",
  "mark": {"type": "bar", "tooltip": {"content": "data"}},
  "encoding": {
    "x": {
      "bin" : {"binned" : false, "step" : 0.025},
      "type": "quantitative",
      "field": "${field:value}"
    },
    "y": {
      "aggregate": "count",
      "scale" : {"domain" : [0, 50]}, 
      "stack": null
    },
    "opacity": {"value": 0.6},
    "detail": [{"field": "name"}, {"field": "color"}],
    "color": {
      "type": "nominal",
      "field": "name",
      "scale": {"range": {"field": "color"}}
    }
  }
}
  1. When you're happy with your changes, save the result so you can reuse it. I named my chart "histogram_small_bins" and made it publicly accessible so that everyone who reads this report can view charts of that type. I recommend this setting if you're sharing the report with anyone and want them to be able to see your charts.

Reusing the new custom preset

Since I've already logged some histograms using the default global "Histogram" preset, I can now change them to my improved custom "histogram_small_bins" format. Click the pencil to edit, find the new preset's name in the dropdown menu from the top left, and select the one you'd like to see. Here I show two before & after charts of scores for different classes (mollusks and reptiles). You can see how the score distribution for each class shifts as the model sees more examples over more epochs, from light green (wider distribution) to blues (more low scores) to purple (bimodal at high and low). You can toggle the individual model variants on/off using the "eye" icon to show more/fewer overlapping distributions.

Shared presets vs one-off edits

Note that you can also make further one-off changes on top of your new preset. For example, in the bottom right, I've edited the Vega spec to show a vertical range of 100. This change is local to this single chart instance and won't modify the "histogram_small_bins" preset. Each time you edit the Vega spec, you can decide whether to save your edits to the shared preset (via "Push changes") or only apply them to the current chart (via "Detach").

Section 5

Logging directly from Python

Beyond editing these charts in the UI, you can now log them directly from Python. Once I'm happy with my "histogram_small_bins" preset, and I have the prediction scores for a particular class (say class=Plantae) in my validations step as the array of floats called plantae_scores:

data = [ [ndx, score] for ndx, s in enumerate(plantae_scores)]
table = wandb.Table(data=data, columns=["id", "score"])
fields = {"value" : "score",  "title" : "Plantae prediction scores"}
custom_histogram = wandb.plot_table(
    vega_spec_name="wandb/histogram_small_bins",
    data_table = table,
    fields = fields)
wandb.log({"custom_id" : custom_histogram})

The steps in more detail:

For a slightly fancier example, check out this Colab. I didn't want to specify each of the 10 classes by hand, so I just log all of them for each run, creating 10 versions of my new preset in a for loop (see 9 below :). All of these charts have the settings adjustments I made to "histogram_small_bins", starting from the "Histogram" global preset. The overall trend in score distributions is about the same: with more training examples and more epochs, the scores distribution narrows to a bimodal one. It's interesting to explore the differences between classes and model variants (toggle the runs on & off)—keep in mind that these are small (as few as 100 training examples for only 1 epoch) and thus fairly noisy experiments.

Section 8

Build precisely the chart you want

These presets and our accompanying APIs aim to be simple and general, but for most of machine learning work, a "standard" approach is insufficient. This is why we recently launched an interactive dev environment for machine learning data visualization. It greatly expands the set of possible charts folks can log natively to W&B (check out the presets in our gallery and the types of queries they can make (e.g. different ways to aggregate, filter, combine, and otherwise parse logged data logged). Most importantly, it lets you interactively adjust your visualizations to fit your exact requirements and then share them with teammates & the whole field.

We're very excited for folks to try this feature—please let us know how it goes in the Comments section below.