Feature Report: W&B Embeddings Projector
W&B's Embedding Projector allows users to plot multi-dimensional embeddings on a 2D plane using common dimension reduction algorithms like PCA, UMAP, and t-SNE.
Created on December 1|Last edited on January 11
Comment
Intro
Embeddings are used to represent objects (people, images, posts, words, etc...) with a list of numbers - sometimes referred to as a vector. In machine learning and data science use cases, embeddings can be generated using a variety of approaches across a range of applications. This page assumes the reader is familiar with embeddings and is interested in visually analyzing them inside of W&B.
Code
## https://scikit-learn.org/stable/datasets/toy_dataset.htmlimport wandbfrom sklearn.datasets import load_iris, load_diabetes, load_digits, load_wine, load_breast_cancerdef get_df_from_sklearn_dataset(loader_fn):ds = loader_fn(as_frame=True)df = ds.datadf["target"] = ds.targetcols = df.columns.tolist()df = df[cols[-1:] + cols[:-1]]if (loader_fn == load_digits):df["image"] = df.apply(lambda row: wandb.Image(row[1:].values.reshape(8, 8) / 16.0), axis=1)cols = df.columns.tolist()df = df[cols[-1:] + cols[:-1]]return dfdef get_all_dfs():return {name: get_df_from_sklearn_dataset(fn) for name, fn in ({"iris": load_iris,"diabetes": load_diabetes,"digits": load_digits,"wine": load_wine,"breast_cancer": load_breast_cancer,}).items()}wandb.init(project="toy_datasets")wandb.log(get_all_dfs())wandb.finish()
Iris Dataset (150 records x 4 dimension - 3 Class Classification)
Wine Dataset (178 records x 13 dimensions - 3 Class Classification)
Diabetes Dataset (442 records x 10 dimensions - Regression)
Breast Cancer Dataset (569 records x 30 dimensions - Binary Classification)
Digits Dataset (1797 records x (32x32) dimensions - 10 Class Classification w/ Media)
Add a comment