Skip to main content

Stellantis POC / Evaluation

Report created by W&B to highlight capabilities as part of your evaluation.
Created on September 9|Last edited on September 14


Weights and Biases (W&B) 💫

Weights and Biases is a ML Ops platform built to facilitate collaboration and reproducibility across the machine learning development lifecycle. Machine learning projects can quickly become a mess without some best practices in place to aid developers and scientists as they iterate on models and move them to production.
W&B is lightweight enough to work with whatever framework or platform teams are currently using, but enables teams to quickly start logging their important results to a central system of record. On top of this system of record, W&B has built visualization, automation, and documentation capabilities for better debugging, model tuning, and project management.


Pilot Plan



Success Criteria Examples




Test Cases / Key Deliverables - Environment

  1. WandB to provide full featured product with which Stellantis will conduct the POC
  2. WandB to provide availability to product SME resources to assist with the POC
  3. WandB to provide product documentation covering all aspects of the product including but not limited to:
  4. Architecture
  5. Data flows
  6. Component interactions
  7. Security controls, both for the product itself as well for those devices that are identified by the product
  8. Software Bill of Materials with version numbers for any open source components used in the product
  9. WandB will run Stellantis environment of choice (Public Cloud/SaaS, Dedicated Cloud, VPC)

Test Cases / Key Deliverable - Demonstrate Product Performance and Effectiveness

Experiment Tracking 🍽

The entry point for W&B usage is Experiments. W&B has a few core primitives which comprise the experiment tracking logging system of the SDK allowing W&B to take logging to a whole new level, allowing you to log pretty much anything with W&B: scalar metrics, images, video, custom plots, etc. Once the logging is completing, we then need to contextualize our experiment, and W&B provides means to accomplish this via Reports.
Even though our experiment tracking is easy to instrument, there are great integrations available to make experiment tracking even easier for popular frameworks and libraries!


Minimal code setup needed to use the product

Getting started with W&B is a simple procedure
!pip install wandb
import wandb
with wandb.init(project = = "my_project") as run:
run.log({"metric": 0.1})
This is the common pattern employed, and can simplify significantly if using an available integration.

Ability to log model stats and results per training run

  • Default stats for models: Loss, Epochs, Accuracy, other common stats
  • Custom / Ad Hoc Stats
  • User Specified
Through the use of W&B python client, you can log anything to W&B. We support logging of data as key value pairs (think python dictionary), where values can take on many difference data types. Below, we show how logging model configuration, loss, accuracy and other common stats manifests. Plotting completed below was done within a project and directly imported into this report.


Some cases, you might be exploring more than 1 modeling methodology, you can leverage W&B to aggregate detail logged for each model. The first example below shows Feature Importance for all available features across all trained models (error bars represent 25th percent and 75 percentile respectively. The second example presents similar detail, but as a heat map where y axis represent the model, the x axis represents the feature, and the cell value represents the importance of the feature for the model. This is considered advanced usage of W&B Custom Charting.
You can also log plots made in your python environment. We provide methods to log Matplotly figures as well as Plotly figures using both wandb.Image and wandb.Plotly to log your figures / plots. In fact, there is a large set of Data types which W&B can log. Please see the details in the docs

Run set
60


Ability to log model code per training run

In our python client, when an experiment is initialized, if the use passes the save_code = True flag, code will be captured. Moreover, if the user is in a notebook, the entire notebook can be logged automatically, and we do this on a per experiment basis. We also permit comparisons of code between runs as seen below.




Ability to log system performance Metrics Per training Run

System performance is logged automatically when a user begins their experiment.


W&B supports rendering PyTorch traces using the Chrome Trace Viewer. There is an excellent W&B report available if you would like to dive deeper on the topic,
The setup can be particularly simple if you are already using PyTorch Lightning for your model development.
wandb_logger = WandbLogger(project='MNIST', log_model='all', save_code=True, ) # log all new checkpoints during training

training_loader = DataLoader(training_set, batch_size=64, shuffle=True, pin_memory=True)
validation_loader = DataLoader(validation_set, batch_size=64, pin_memory=True)
## Using a raw DataLoader, rather than LightningDataModule, for greater transparency

# Set up model
model = MNIST_LitModule(n_layer_1=128, n_layer_2=128)

trainer = Trainer(gpus=None, max_epochs=5, profiler="pytorch",logger=wandb_logger,
callbacks=[
log_predictions_callback,
checkpoint_callback
],
precision=32)
trainer.profiler.dirpath="/content/wandb/latest-run/tbprofile"
trainer.fit(model, training_loader, validation_loader)
# trace_files = glob.glob("/content/lightning_logs/*.pt.trace.json")
trace_files = glob.glob("/content/wandb/latest-run/tbprofile/*.pt.trace.json")
for i, trace_file in enumerate(trace_files):
if "training_step" in trace_file:
profile_art = wandb.Artifact(f"train-trace{i}-{wandb.run.id}", type="profile")
profile_art.add_file(trace_file, "train_trace.pt.trace.json")
else:
profile_art = wandb.Artifact(f"validation-trace{i}-{wandb.run.id}", type="profile")
profile_art.add_file(trace_file, "validation_trace.pt.trace.json")
wandb.log_artifact(profile_art)
wandb.finish()
Which can be used to render the trace in the UI, and from there, you can share via the UI itself, or incorporate the trace into your reports as needed, also you can make available system usage



Ability to log to different projects

It is simple to log to different projects. Consider the python script which requires logging info to two seperate projects, project1 and project2 it is as simple as
with wandb.init(project = "project1") as run:
run.log({"metric": 0.2})

with wandb.init(project = "project2") as run:
run.log({"metric": 0.3")
Then once this data has been logged, you can create reports that span multiple projects using /Panel grid command within a report.

Run set
1
Run set 2
Run set 3
1


Ability to import offline run data.

If you are in a situation with no connectivity, but still want to run experiments locally and upload your data at another time, you may always do so via wandb sync command.
In the example, we'll consider the case where wandb is being run in offline model.
with wandb.init(project = "my_project", mode = "offline") as run:
run.log({"metric": 0.02})
This would require you to perform a sync with W&B after the fact. All of the these offline runs get placed in ./wandb/offline*. From here, the sync could look like the following
import glob
import subprocess
offline_runs = glob.glob("./wandb/offline*")
for run in offline_runs:
output = subprocess.run(["wandb", "sync", run], stdout=subprocess.PIPE)
print(output.stdout)
You could also use wandb.resume in order to have W&B automatically resume runs that have crashed of exited unsuccessfully. Please see the docs.

Ability to perform analysis on model runs and model run data

  • Visual Analytics
  • Numerical Analytics

Interactive Tables

Through our Tables product, you can easily log sample predictions and interact with data. Moreover, supporting evidence could include: Artifacts (models and datasets), custom charts, and other data.
Below is an example of a sample of predictions logged to W&B. Predictions were made on the image column, and we have provided the actual label as well as the prediction across all runs where predictions have been logged. The table supports interactive analysis.
W&B Tables enable a granular analysis of predictions and results through tabular data manipulation. Oftentimes, understanding a model's behavior during or after training requires more than seeing a clean loss curve go down and to the right. We need to understand where specifically the model fails, what examples are giving it trouble, where we might need to collect more training data/re-label, or maybe even uncover more nuanced errors like numerical instability.
Tables can be used as a model evaluation store, which stores consolidated results on golden validation datasets across different trained models in your project. They can also be used as model leaderboards, where each row is a model class or architecture with embedded explainability or custom performance charts alongside them. These are both best practices which you can start incorporating with a few lines of code.



Ability to log supporting evidence per training run including prediction Samples






Artifact Tracking and Versioning

Artifacts are inputs and outputs of each part of your machine learning pipeline, namely datasets and models. Training datasets change over time as new data is collected, removed, or re-labeled, models change with new architectures being implemented along with continuous re-retraining. With these changes, all downstream tasks utilizing the changed datasets and models will be affected and understanding this dependency chain is critical for debugging effectively. W&B can log this dependency graph easily with a few lines of code.


Ability to log model file itself

Through usage of Artifacts, model files can easily be logged to W&B, and we've extended artifacts with a Model Registry (shown below). Please see the docs for more detail on Model Management

Credit Decision LID 62a88bde90d0f2003c6a7bf9
Model card
Tags
Full name
tim-w/model-registry/Credit Decision LID 62a88bde90d0f2003c6a7bf9
Type
model
Created At
June 14th, 2022
Automations
Automate actions based on changes in this registered model.
Slack notifications
Notify the team when changes happen in the model registry.
Description

MRM ID: 62a88bde90d0f2003c6a7bf9

Usage

import wandb
api = wandb.Api()

Versions
Version
Aliases
Logged By
Tags
Created
TTL Remaining
# of Consuming Runs
Size
m.seed
m.nbins
m.nfolds
m.ntrees
m.auc_type
m.min_rows
m.model_id
m.max_depth
m.checkpoint
m.learn_rate
m.nbins_cats
m.fold_column
m.huber_alpha
m.r2_stopping
m.sample_rate
m.distribution
m.offset_column
m.tweedie_power
m.gainslift_bins
m.histogram_type
m.quantile_alpha
m.training_frame
m.weights_column
m.balance_classes
m.calibrate_model
m.col_sample_rate
m.fold_assignment
m.ignored_columns
m.nbins_top_level
m.response_column
m.stopping_metric
m.stopping_rounds
m.max_runtime_secs
m.validation_frame
m.calibration_frame
m.ignore_const_cols
m.custom_metric_func
m.stopping_tolerance
m.build_tree_one_node
m.score_tree_interval
m.categorical_encoding
m.learn_rate_annealing
m.monotone_constraints
m.pred_noise_bandwidth
m.score_each_iteration
m.max_abs_leafnode_pred
m.min_split_improvement
m.sample_rate_per_class
m.class_sampling_factors
m.export_checkpoints_dir
m.max_after_balance_size
m.check_constant_response
m.interaction_constraints
m.col_sample_rate_per_tree
m.custom_distribution_func
m.max_confusion_matrix_size
m.keep_cross_validation_models
m.col_sample_rate_change_per_level
m.keep_cross_validation_predictions
m.keep_cross_validation_fold_assignment
0
latest
production
v0
Tue Jun 14 2022
Inactive
1
343.4kB
3
20
5
35
AUTO
100
GBM_1_AutoML_1_20220614_145344
15
0.1
1024
0.9
1.7976931348623157e+308
0.8
bernoulli
1.5
-1
UniformAdaptive
0.5
AutoML_1_20220614_145344_training_py_11_sid_9f50
False
False
0.8
Modulo
1024
default payment next month
logloss
0
0
True
0.006902
False
5
Enum
1
0
False
1.7976931348623157e+308
0.00001
5
True
0.8
20
False
1
True
False
Loading...
Automations
No automations yet
Automate actions based on changes in the collection.


Ability to log info about the data used to train the model

Not only would you be permitted to log datasets via artifacts, but you will also be able to log an supplementary info concerning your data. Image the instance where a training dataset is curated from three different dataset. We can log the schema to W&B as well as surface it in a report alongside the actual datasets detail logged to W&B.

processed-data
Artifact overview
Type
data
Created At
July 28th, 2022
Description
Versions
Version
Aliases
Logged By
Tags
Created
TTL Remaining
# of Consuming Runs
Size
0
latest
v0
Thu Jul 28 2022
Inactive
0
3.4MB
Loading...



Below is a dashboard created with evidentlyAI used to explore the data. Please excuse the messiness as this is a result of VERY long feature names.

Run set
1


Ability to retrieve / export all data logged

You may either use the W&B client or W&B api to retrieve such data. Click into the Overview tab below and find the detail on Model Usage to see usage via W&B API. Alternatively, click into the Usage tab below to see usage by way of W&B client. These are unique, as the latter couples the usage / download of the artifact with an experiment, whereas the former does no such thing.
You may also retrieve other details via the API such as experiment summary / configurations and the like. Please see the docs.

0j2dhe2k-mnist-model
Artifact overview
Type
model
Created At
September 13th, 2022
Description
Versions
Version
Aliases
Logged By
Tags
Created
TTL Remaining
# of Consuming Runs
Size
0
latest
v0
Tue Sep 13 2022
Inactive
0
117.7kB
Loading...

Reports

W&B reports help contextualize and document the system of record built through logging diagnostics and results from different pieces of your pipeline. Reports are interactive and dynamic, reflecting filtered run sets logged in W&B. You can add all sorts of assets to a report; the one you are reading now includes plots, tables, images, code, and nested reports.

Ability to present model development effort progress

This is exactly the purpose of Reports. You can contextualize experiments and project, and share reports with colleagues and stake holders.

artifact
artifact
artifact