Skip to main content

Feature Report: W&B Embeddings Projector

W&B's Embedding Projector allows users to plot multi-dimensional embeddings on a 2D plane using common dimension reduction algorithms like PCA, UMAP, and t-SNE.
Created on December 1|Last edited on January 11

Intro

Embeddings are used to represent objects (people, images, posts, words, etc...) with a list of numbers - sometimes referred to as a vector. In machine learning and data science use cases, embeddings can be generated using a variety of approaches across a range of applications. This page assumes the reader is familiar with embeddings and is interested in visually analyzing them inside of W&B.

Code

## https://scikit-learn.org/stable/datasets/toy_dataset.html

import wandb
from sklearn.datasets import load_iris, load_diabetes, load_digits, load_wine, load_breast_cancer

def get_df_from_sklearn_dataset(loader_fn):
ds = loader_fn(as_frame=True)
df = ds.data
df["target"] = ds.target
cols = df.columns.tolist()
df = df[cols[-1:] + cols[:-1]]
if (loader_fn == load_digits):
df["image"] = df.apply(lambda row: wandb.Image(row[1:].values.reshape(8, 8) / 16.0), axis=1)
cols = df.columns.tolist()
df = df[cols[-1:] + cols[:-1]]
return df

def get_all_dfs():
return {name: get_df_from_sklearn_dataset(fn) for name, fn in ({
"iris": load_iris,
"diabetes": load_diabetes,
"digits": load_digits,
"wine": load_wine,
"breast_cancer": load_breast_cancer,
}).items()}

wandb.init(project="toy_datasets")
wandb.log(get_all_dfs())
wandb.finish()

Iris Dataset (150 records x 4 dimension - 3 Class Classification)

Wine Dataset (178 records x 13 dimensions - 3 Class Classification)

Diabetes Dataset (442 records x 10 dimensions - Regression)

Breast Cancer Dataset (569 records x 30 dimensions - Binary Classification)

Digits Dataset (1797 records x (32x32) dimensions - 10 Class Classification w/ Media)