Datadog PoV Guide

One stop shop for everything you need to test out during the W&B Pilot.

Created on August 12|Last edited on August 22

Comment

﻿
Access Weights & Biases here:  https://datadog-aws-us-east.wandb.io/﻿﻿﻿﻿﻿
💡
For Any Questions, post them to the wandb-datadog slack channel
💡
W&B Models 101 Colab: https://colab.research.google.com/drive/1t2jbB0CKalfS4rAxMYKKENKCYy0dsEx2?usp=sharing﻿
💡
Weights and Biases (W&B) 💫PoC Workshop SessionsQuick LinksW&B Installation & AuthenticationFocus Areas for the POC1: Experiment Tracking (More Details with example script in the )2: Artifacts (More Details with example script in the 3: Registry (More Details with example script in the )Track any python process or experiment with W&B's Experiment Tracking 🍽Visualize and query dataframes via W&B TablesTrack and version any serialized data via W&B Artifacts Tracking and VersioningHouse staged/candidate models via W&B's RegistryTune Hyperparameters via W&B SweepsOrganize visualizations and share your findings with collaborators via W&B ReportsOther Useful ResourcesImport/Export APISlack AlertsFAQsW&B Models
﻿
Weights and Biases (W&B) 💫Weights and Biases is a ML Ops platform built to facilitate collaboration and reproducibility across the machine learning development lifecycle. Machine learning projects can quickly become a mess without some best practices in place to aid developers and scientists as they iterate on models and move them to production. 
W&B is lightweight enough to work with whatever framework or platform teams are currently using, but enables teams to quickly start logging their important results to a central system of record. On top of this system of record, W&B has built visualization, automation, and documentation capabilities for better debugging, model tuning, and project management. 
PoC Workshop Sessions
  
      Date
      Session
      Recording
      Topics Discussed
    

  
      Aug 8, 2025
      PoC Planning Call
      
        Recording (Gong)
      
      
        Align on use cases and criteria to be tested during the PoC (
        Use Cases Doc)
      
    

      Aug 20, 2025
      PoC Kickoff
      
        Recording (Gong)
      
      
        Kickoff meeting: Demo and 101 walkthrough (
        Models 101 Notebook)
      
    

﻿
Quick LinksW&B is built with scale in mind for highly scalable workflows. Here's an example project with large-scale runs (it has 1000s of runs with 10k metrics with 100k steps each)﻿﻿﻿﻿
💡
Access the main documentation page: https://docs.wandb.ai/﻿
Learn how to integrate W&B with your stack: https://docs.wandb.ai/guides/integrations﻿
Integration with PyTorch: https://docs.wandb.ai/guides/integrations/pytorch/﻿
Integration with TensorFlow: https://docs.wandb.ai/guides/integrations/tensorflow/﻿
Integration with Scikit-Learn: https://docs.wandb.ai/guides/integrations/scikit/﻿
Track your experiment metrics: https://docs.wandb.ai/guides/track﻿
Track and version datasets, models, and other artifacts: https://docs.wandb.ai/guides/artifacts (Code)
Visualize your data with interactive tables: https://docs.wandb.ai/guides/data-vis/tables-quickstart (Code)
Tune hyper-parameters via sweeps: https://docs.wandb.ai/guides/sweeps (Code)
Use reports to share your findings: https://docs.wandb.ai/guides/reports﻿
W&B Installation & AuthenticationTo start using W&B, you first need to install the Python package (if it's not already there)
pip install wandb
Once it's installed, authenticate your user account by logging in through the CLI or SDK. You should have receive an email to sign up to the platform, after which you can obtain your API token (The API token is in your "Settings" section under your profile)
wandb login --host <YOUR W&B HOST URL> <YOUR API TOKEN>
OR through Python:
﻿﻿wandb.login(host=os.getenv("WANDB_BASE_URL"), key=os.getenv("WANDB_API_KEY"))
In headless environments, you can instead define the WANDB_API_KEY environment variable.﻿
Once you are logged in, you are ready to track your workflows!
Focus Areas for the POC
  
      S No
      Capability
      Things to test
    

  
      1.
      Experiment Tracking
      
        Logging parameters and metrics
Logging system metrics (CPU, GPU, memory)
Logging logs (stdout, stderr)
Real-time ingestion of metrics
Saving views and sharing them with others
Generating reports and dashboards
Running hyperparameter tuning sweeps (grouping runs)
Scalability with high metrics cardinality
Framework compatibility (PyTorch, TensorFlow, scikit-learn, etc.)
Logging media and visual outputs (images, audio, etc.)

      
    

      2.
      Artifacts
      
        Versioning datasets, models, and outputs
Attaching artifacts to runs
Managing large files (datasets or models > 1GB)
Sharing and accessing artifacts across teams
Tracking lineage (e.g., which dataset and model were used in a specific run)
Retention policies and storage monitoring
Using artifacts programmatically via API/SDK

      
    

      3.
      Registry
      
        Registering models after training
Attaching metadata, metrics, and lineage to registered models
Promoting models through stages (e.g., staging → production)
Audit logging of model registration and promotion events
Searching/filtering registered models by tags or metrics
Access control and sharing policies
Integration with CI/CD pipelines for automated promotion
Pulling and using registered models in downstream jobs
Partitioned model support with custom registries
Scalability w.r.t. model versions (register a model with hundreds of thousands of versions)
Scalability w.r.t. partitioned models

      
    

﻿
﻿
W&B Models 101 Colab: https://colab.research.google.com/drive/1t2jbB0CKalfS4rAxMYKKENKCYy0dsEx2?usp=sharing﻿
1: Experiment Tracking (More Details with example script in the Experiment Tracking section below)
2: Artifacts (More Details with example script in the Artifacts section below)﻿
3: Registry (More Details with example script in the Registry section below)
Track any python process or experiment with W&B's Experiment Tracking 🍽
Visualize and query dataframes via W&B Tables
Track and version any serialized data via W&B Artifacts Tracking and Versioning
House staged/candidate models via W&B's Registry
Tune Hyperparameters via W&B Sweeps
Organize visualizations and share your findings with collaborators via W&B Reports
Other Useful Resources
Import/Export APIAll data logged to W&B can be accessed programmatically through the import/export API (also called the public API). This enables you to pull down run and artifact data, filter and manipulate it how you please in Python. 
Slack AlertsYou can set slack alerts within a run to trigger when things happen in your training / evaluation scripts. For example, you may want to notify you when training is done or when a metric exceeds a certain value. 
Details on enabling these alerts on your dedicated deployments can be found here﻿
FAQs
W&B Models1. I didn't name my run. Where is the run name coming from?
If you do not explicitly name your run, a random run name will be assigned to the run to help identify the run in the UI. For instance, random run names will look like "pleasant-flower-4" or "misunderstood-glade-2".
2. How can I configure the name of the run in my training code?
At the top of your training script when you call wandb.init, pass in an experiment name, like this: 
wandb.init(name="my_awesome_run")
3. If wandb crashes, will it possibly crash my training run?
It is extremely important to us that we never interfere with your training runs. We run wandb in a separate process to make sure that if wandb somehow crashes, your training will nevertheless continue to run.
4. Why is a run marked crashed in W&B when it’s training fine locally?
This is likely a connection problem — if your server loses internet access and data stops syncing to W&B, we mark the run as crashed after a short period of retrying.
5. Does W&B support Distributed training?
Yes, W&B supports distributed training, here's the detailed guide on how to log distributed training experiments.
6. Can I use PyTorch profiler with W&B?
﻿Here's a detailed report that walks through using the PyTorch profiler with W&B along with this associated Colab notebook.
7. What happens when a TTL policy is set to an Artifacts that is linked to Registry
W&B deactivates the option to set a TTL policy for model artifacts linked to the Model Registry. This is to help ensure that linked models do not accidentally expire if used in production workflows. More details can be found on the docs here﻿
8. How do I stop wandb from writing to my terminal or my jupyter notebook output?
Set the environment variable WANDB_SILENT to true.
In Python
os.environ["WANDB_SILENT"] = "true"
Within Jupyter Notebook
%env WANDB_SILENT=true
With Command Line
WANDB_SILENT=true
﻿

Date	Session	Recording	Topics Discussed
Aug 8, 2025	PoC Planning Call	Recording (Gong)	Align on use cases and criteria to be tested during the PoC ( Use Cases Doc)
Aug 20, 2025	PoC Kickoff	Recording (Gong)	Kickoff meeting: Demo and 101 walkthrough ( Models 101 Notebook)

S No	Capability	Things to test
1.	Experiment Tracking	Logging parameters and metrics Logging system metrics (CPU, GPU, memory) Logging logs (stdout, stderr) Real-time ingestion of metrics Saving views and sharing them with others Generating reports and dashboards Running hyperparameter tuning sweeps (grouping runs) Scalability with high metrics cardinality Framework compatibility (PyTorch, TensorFlow, scikit-learn, etc.) Logging media and visual outputs (images, audio, etc.)
2.	Artifacts	Versioning datasets, models, and outputs Attaching artifacts to runs Managing large files (datasets or models > 1GB) Sharing and accessing artifacts across teams Tracking lineage (e.g., which dataset and model were used in a specific run) Retention policies and storage monitoring Using artifacts programmatically via API/SDK
3.	Registry	Registering models after training Attaching metadata, metrics, and lineage to registered models Promoting models through stages (e.g., staging → production) Audit logging of model registration and promotion events Searching/filtering registered models by tags or metrics Access control and sharing policies Integration with CI/CD pipelines for automated promotion Pulling and using registered models in downstream jobs Partitioned model support with custom registries Scalability w.r.t. model versions (register a model with hundreds of thousands of versions) Scalability w.r.t. partitioned models

Add a comment

Datadog PoV Guide

Weights and Biases (W&B) 💫

PoC Workshop Sessions

Quick Links

W&B Installation & Authentication

Focus Areas for the POC

1: Experiment Tracking (More Details with example script in the Experiment Tracking section below)

2: Artifacts (More Details with example script in the Artifacts section below)﻿

3: Registry (More Details with example script in the Registry section below)

Track any python process or experiment with W&B's Experiment Tracking 🍽

Visualize and query dataframes via W&B Tables

Track and version any serialized data via W&B Artifacts Tracking and Versioning

House staged/candidate models via W&B's Registry

Tune Hyperparameters via W&B Sweeps

Organize visualizations and share your findings with collaborators via W&B Reports

Other Useful Resources

Import/Export API

Slack Alerts

FAQs

W&B Models

2: Artifacts (More Details with example script in the Artifacts section below)