Viewing, Comparing, and Sharing PyTorch Traces in W&B
Get performance-critical data out of ephemeral tracebacks and GPU monitoring tools and into interactive, shareable dashboards.
Created on August 24|Last edited on August 25
Comment
Interactive Trace Viewers
Using the Trace Viewer
The interactive panels above show software traces from a few steps of training a simple ConvNet on the MNIST dataset. There are two panels, left and right, which are from two different runs that have different parameters. The run on the left has num_workers=1 and that on the right has num_workers=0.
The top section shows the processes, threads, and streams running on the CPU and GPU. The extent of a single training step (forward pass, backward pass, and parameter update) is indicated by the bars labeled ProfilerStep#X. Traces for CPU processes appear above, and traces for GPU processes appear below.
Each colored block can be clicked for more information (e.g. call stack, function name, and duration). This information appears in a pane below the trace itself, separated by a gray bar. Depending on your browser window size and resolution, you may need to drag the separator bar between the panes (above "Nothing selected. Tap stuff." if you haven't clicked any of the colored blocks) to view the traces and details more easily.
Click and drag to highlight multiple blocks and get a summary, including total and average duration for each type of operation.
The highlight tool is only one way to interact with the viewer. The small toolbar in gray allows for others. Click the four-way arrow to change mouse interaction to panning. The arrow pointing into and out of the screen changes mouse interaction to zooming. We'll use the zoom as our anatomical microscope, so that we can observe fine details of specific steps.
The numbers and arrows in the bottom left corner can be used to scroll through a small gallery of traces. Click the arrows to browse traces of different runs using different configurations. You can also compare traces across runs, alongside metadata information like DataLoader parameters, in this W&B workspace. I recommend opening that workspace in a parallel browser window and following the instructions below to find interesting traces and identify their important features.
The trace information is produced by the PyTorch profiler and rendered using the Chrome Trace Viewer.
If you've already incorporated the PyTorch profiler into your code, you just need to upload your trace event files to W&B.
By uploading the traces to W&B instead of keeping them inside Tensorboard, you gain access to all of W&B's tracking, comparison, and sharing features. For example, you can put them inside a Report, like we did here, to share with colleagues, while still keeping the data co-located with the run that produced it (example).
It's also much easier to spin up multiple Trace Viewers so that you can compare results. Looking at the two traces at the top of this report, it's easy to quickly see that the GPU stream (stream 7, towards the bottom) for the run on the right is mostly empty -- the GPU is sitting idle for far longer than in the run on the left. The Trace Viewer makes the idleness visible and obvious, making it easier to diagnose and communicate. Subtler details and rich statistical data can be compared as well, including even Streaming Multiprocessor utilization in the latest version.
with wandb.init(project="my-profiling-project") as run:profile_art = wandb.Artifact("trace", type="profile")profile_art.add_file("path/to/{filename}.pt.trace.json", "trace.pt.trace.json")run.log_artifact(profile_art)
That means you can connect these traces to the models that were trained during the same run and the datasets consumed or analyses produced by that run.
For more details on how to integrate the profiler into your PyTorch code, see the Colab linked at the top of the article.
Learn More!
If you'd like to see how to use the trace viewer to understand the optimization advice in the Karpathy tweet below, see this Report.
Add a comment