Create Your Own Preset: Composite Histogram

Log to a custom chart from Python with W&B. Made by Stacey Svetlichnaya using Weights & Biases
Stacey Svetlichnaya

Create a chart interactively; plot data from a script

In this report I describe how to create a custom chart, such as the composite histogram below, with the W&B UI and then log to this custom chart directly from a script (Python training code). Click on the gear icon in the top right to see a slider that adjusts the bin size, giving a slightly different perspective on the data. This chart shows confidence scores of an image classification CNN for two different labels on the same images, specifically for identifying the creature in each image as an amphibian versus a reptile. This toy model is a bit more certain about amphibians (narrower, higher peak) than reptiles (broader range of scores).

Why use custom charts?

If you're already convinced, skip to the next section for the instructions. Some advantages of this approach:

Step 1: Log your data as a wandb.Table()

To read data from your rusn into a chart, first log that data to wandb.Table() object, which is a 2D array where each row is a data point and each column a dimension/feature of that data point. For example, I want to look at the distribution of prediction confidence scores across all 10 classes of my model (a CNN finetuned on photos of 10 classes of living things: birds, insects, plants, etc). Each row in my data is an image from my validation set, and each column is a confidence score for a particular class. Let's say I start with a dictionary of class predictions scores (class_preds) where the keys are class ids and the values are the confidence scores in a fixed order. Now all my prediction scores will be available as arrays by the corresponding column name.

class_names=["Amphibia", "Animalia", "Arachnida", "Aves", "Fungi", "Insecta", "Mammalia", "Mollusca", "Plantae", "Reptilia"]
data  = []
for val_item in range(len(class_preds[0])):
    row  = [class_preds[label][val_item] for label in range(10)]
    data.append(row)

table = wandb.Table(data=data, columns=class_names)
wandb.log({"custom_table_id" : table})

Step 2: Create your custom preset in the UI

W&B custom charts are written in Vega, a powerful and flexible visualization language. You can find many examples and walkthroughs online, and it can help to start with an existing preset that is most similar to your desired custom visualization. You can iterate from small changes in our IDE, which renders the plot as you change its definition. You can see the final Vega spec for my "composite_histogram" at the end of this report. Screen Shot 2020-10-14 at 8.57.14 AM.png

Step 3: Map relevant data fields from logged runs into your chart

Modify the run query to feed your run data from Step 1 into the chart: change summary to summaryTable in the query and enter your custom_table_id (the key to which you logged the table in the previous section) as the first entry in keys. In this screenshot, I logged a table under the key hs_0_table, and having logged a column for each of my 10 class names, I chose the "Animalia" column to show up as red_bins and "Plantae" to show up as blue_bins. The title can be hardcoded directly in the Vega spec, or you can use the special placeholder ${field:title:Default Chart Title Here} to allow for a user-defined input string for each instance of the custom chart. Screen Shot 2020-10-14 at 9.08.18 AM.png

Iterate on the Vega spec until you are happy—the panel on the right will re-render interactively as you change the spec or the query. When you are done, make sure to save the Vega spec as a preset (hit "Save As") with a descriptive name (e.g. composite_histogram for me) so you can reuse it within the current project. If you'd like other people to be able to see your report, make sure the preset is publicly accessible.

Step 4: Log to your saved preset from your script!

For any future runs, you can log to your custom preset directly from a Python script with the wandb.plot_table() method:

 # set up field mapping for each two-way class comparison (TODO: use fstrings)
animal_plant = {"red_bins" : "Animalia", "blue_bins" : "Plantae", "title" : "Animals (red) vs Plants (blue)"}
rep_amph = {"red_bins" : "Amphibia", "blue_bins" : "Reptilia", "title" : "Amphibians (red) vs Reptiles (blue)"}
ara_ins = {"red_bins" : "Arachnida", "blue_bins" : "Insecta", "title" : "Arachnids (red) vs Insects (blue)"}
bird_fun = {"red_bins" : "Aves", "blue_bins" : "Fungi", "title" : "Birds (red) vs Mushrooms (blue)"}

for i, fields in enumerate([animal_plant, rep_amph, ara_ins, bird_fun]):
      # create a composite histogram for each two-way comparison
      composite_histogram = wandb.plot_table(vega_spec_name="wandb/composite_histogram", data_table=table, fields=fields)
      # log under a different key to create a separate plot
      wandb.log({"hs_" + str(i) : composite_histogram})

Examples: 100 training samples, 1 epoch

2-way comparisons of class confidence score distributions after very minimal training. Since we're looking at so few training examples, these results are fairly stochastic. You can click on the gear icon in the top right of each plot to adjust the bin size. In this particular experiment, the model returns a much wider range of confidence scores for Amphibians than for Reptiles/Animals/Plants, which are virtually indistinguishable in this view.

More examples: 1000 examples, 1 epoch

Some class pairings converge to more similar scores (Animals vs Plants keep a matching distribution) and some diverge (Birds vs Mushrooms go from a near-identical histogram to complete opposites).

P.S. Full Vega Spec for the Composite Histogram

Here is the full Vega spec for my custom composite historgam preset. It is intended as an MVP, and we welcome suggestions for improving or extending it (3 layers? N layers? more custom colors?)—try it and let us know how it goes! We especially welcome any feedback via the Report comments at the bottom of this page.

{
  "$schema": "https://vega.github.io/schema/vega/v5.json",
  "padding": 5,
  "signals": [
    {
      "name": "binOffset",
      "value": 0
    },
    {
      "name": "binStep",
      "value": 0.025,
      "bind": {
        "input": "range",
        "min": 0.01,
        "max": 0.5,
        "step": 0.001
      }
    }
  ],
  "data": [
    {"name" : "wandb"},
    {
      "name": "red_bins",
      "source": "wandb",
      "transform": [
        {
          "type": "bin",
          "field": "${field:red_bins}",
          "extent": [
            -1,
            1
          ],
          "anchor": {
            "signal": "binOffset"
          },
          "step": {
            "signal": "binStep"
          },
          "nice": false
        },
        {
          "type": "aggregate",
          "key": "bin0",
          "groupby": [
            "bin0",
            "bin1"
          ],
          "fields": [
            "bin0"
          ],
          "ops": [
            "count"
          ],
          "as": [
            "count"
          ]
        }
      ]
    },
    {
      "name": "blue_bins",
      "source": "wandb",
      "transform": [
        {
          "type": "bin",
          "field": "${field:blue_bins}",
          "extent": [
            -1,
            1
          ],
          "anchor": {
            "signal": "binOffset"
          },
          "step": {
            "signal": "binStep"
          },
          "nice": false
        },
        {
          "type": "aggregate",
          "key": "bin0",
          "groupby": [
            "bin0",
            "bin1"
          ],
          "fields": [
            "bin0"
          ],
          "ops": [
            "count"
          ],
          "as": [
            "count2"
          ]
        }
      ]
    }
  ],
  "title": {
    "text": "${field:title}"
  },
  "scales": [
    {
      "name": "xscale",
      "type": "linear",
      "range": "width",
      "domain": [0.0, 1.0]
    },
    {
      "name": "yscale",
      "type": "linear",
      "range": "height",
      "round": true,
      "domain": {
        "fields" : [
          {"data" : "red_bins", "field" : "count"},
          {"data" : "blue_bins", "field" : "count2"}
        ]
      },
      "zero": true,
      "nice": true
    }
  ],
  "axes": [
    {
      "orient": "bottom",
      "scale": "xscale",
      "zindex": 1,
      "title": "Frequency count"
    },
    {
      "orient": "left",
      "scale": "yscale",
      "tickCount": 5,
      "zindex": 1,
      "title" : "Number of examples"
    }
  ],
  "marks": [
    {
      "type": "rect",
      "from": {
        "data": "red_bins"
      },
      "encode": {
        "update": {
          "x": {
            "scale": "xscale",
            "field": "bin0"
          },
          "x2": {
            "scale": "xscale",
            "field": "bin1",
            "offset": {
              "signal": "binStep > 0.02 ? -0.1 : 0"
            }
          },
          "y": {
            "scale": "yscale",
            "field": "count"
          },
          "y2": {
            "scale": "yscale",
            "value": 0
          },
          "fill": {
            "value": "firebrick"
          },
          "fillOpacity": {
            "value" : 0.5
          }
        },
        "hover": {
          "fill": {
            "value": "purple"
          }
        }
      }
    },
    {
      "type": "rect",
      "from": {
        "data": "blue_bins"
      },
      "encode": {
        "update": {
          "x": {
            "scale": "xscale",
            "field": "bin0"
          },
          "x2": {
            "scale": "xscale",
            "field": "bin1",
            "offset": {
              "signal": "binStep > 0.02 ? -0.1 : 0"
            }
          },
          "y": {
            "scale": "yscale",
            "field": "count2"
          },
          "y2": {
            "scale": "yscale",
            "value": 0
          },
          "fill": {
            "value": "steelblue"
          },
          "fillOpacity": {
            "value" : 0.5
          }
        },
        "hover": {
          "fill": {
            "value": "purple"
          }
        }
      }
    }
  ]
}