Cell Discovery Catalog

Created on January 5|Last edited on November 15
Comment
﻿
Background: Variational InferenceCell Type LabelingModel TrainingDatasetsModel EvaluationModel InterpretabilityReferences
﻿
﻿
﻿Link to Colab﻿﻿﻿
Background: Variational Inference
Cell Type Labeling
Model Training﻿
As mentioned above, once we have specifications for our likelihood and variational distribution, training consists of gradient descent on the ELBO. We train and evaluate this model on two datasets from cells coming from two different tissues types: blood and bone marrow. In the peripheral blood mononuclear cell (pbmc) dataset, there are only 200 labeled cells of 4 different cell types in a dataset of 20k samples with 20k+ genes. The marrow dataset has about 8000 samples with 300 genes and 12 uniques cell types. 
﻿
There are many hyper-parameters we could choose to tune, but variational auto-encoders are known to be fairly insensitive to hyper-parameters in the unsupervised context, where we are only concerned with maximizing the log probability of the data. However, we can perform sweeps using weights and biases that help us find good settings for those hyper-parameters we might want to put more care into such as: 
The dimensionality of the latent space for z1z1z1 and z2z2z2
α\alphaα-the weighting of the classification loss in the objective
batch size
learning rate decay 
In the semisupervised setting, we are optimizing for the ELBO as well as a classification loss. The supervised component is concerned with predicting the class yyy of a sample based on its latent representation z2z_2z2​. If z2z_2z2​ is just normally distributed as in the mean-field case of the variational distribution, then it may not be an optimal representation such that the different classes are well separated in the latent space. A normalizing flow will allow z2z_2z2​ to have arbitrary complexity in its representation so that the classifier network of the encoder q(y∣z2)q(y|z_2)q(y∣z2​) can discriminate between the classes easier. Indeed, we see better classification performance when a normalizing flow is employed. A normalizing flow is sometimes overkill, but it can be useful to see quickly whether using one can offer any benefits using Weights and Biases. 
﻿
﻿
﻿
Sweep: scanvi-sweep 119
﻿
﻿
Datasetsbone marrow and peripheral blood mononuclear cells (pbmc). The bone marrow dataset has about 8000 cells and 400 genes, with a little under half of them labeled into 14 diffferent cell types. The pbmc dataset is much larger with 20k cells and 20k+ genes, with only 200 of them labeled into four different cell types.  
﻿
project("kenleejr", "pyro-scanvi").artifact("marrow_genes")
marrow_genesVersion 1
All Versions
Aliases
latest
Versions
v1
v0
VersionMetadataUsageFilesLineage
Direct lineage view
Expanded graph
Include generated artifacts
Some nodes are concealed in this view - Break out items to reveal more.
Artifact - dataset
marrow_genes:v1
Runs
58
cerulean-terrain-56
prediction_insights
vocal-glitter-52
marrow_dataset_split
dashing-voice-50
prediction_insights
earnest-glade-30
marrow_dataset_split
eager-sweep-19
marrow_training
classic-sweep-18
marrow_training
gentle-sweep-17
marrow_training
drawn-sweep-16
marrow_training
vital-sweep-15
marrow_training
pleasant-sweep-14
marrow_training
generous-sweep-13
marrow_training
rare-sweep-12
marrow_training
dutiful-sweep-11
marrow_training
hardy-sweep-10
marrow_training
jolly-sweep-9
marrow_training
rich-sweep-8
marrow_training
upbeat-sweep-7
marrow_training
lucky-sweep-6
marrow_training
iconic-sweep-5
marrow_training
deft-sweep-4
marrow_training
eager-sweep-3
marrow_training
ancient-sweep-2
marrow_training
silvery-sweep-1
marrow_training
leafy-bush-318
marrow_dataset_split
dandy-sweep-25
marrow_dataset_split
apricot-sweep-24
marrow_dataset_split
youthful-sweep-23
marrow_dataset_split
super-sweep-22
marrow_dataset_split
vocal-sweep-21
marrow_dataset_split
zesty-sweep-20
marrow_dataset_split
polished-sweep-19
marrow_dataset_split
pretty-sweep-18
marrow_dataset_split
icy-sweep-17
marrow_dataset_split
sleek-sweep-16
marrow_training
icy-sweep-15
marrow_training
happy-sweep-14
marrow_training
jumping-sweep-13
marrow_training
vital-sweep-12
marrow_training
expert-sweep-11
marrow_training
fast-sweep-10
marrow_training
exalted-sweep-9
marrow_training
lemon-sweep-8
marrow_training
royal-sweep-7
marrow_training
bumbling-sweep-6
marrow_training
vivid-sweep-5
marrow_training
lucky-sweep-4
marrow_training
polar-sweep-3
marrow_training
magic-sweep-2
marrow_training
cool-sweep-1
marrow_training
happy-bush-290
marrow_dataset_split
feasible-snow-288
marrow_dataset_split
eternal-sweep-2
marrow_training
swift-sweep-1
marrow_training
hearty-deluge-283
marrow_dataset_split
scarlet-sky-262
marrow_dataset_split
hearty-river-254
marrow_dataset_split
faithful-feather-252
marrow_dataset_split
laced-sponge-250
marrow_dataset_split
React Flow
﻿
project("kenleejr", "pyro-scanvi").artifact("pbmc_genes")
pbmc_genesVersion 1
All Versions
Aliases
latest
Versions
v1
VersionMetadataUsageFilesLineage
Direct lineage view
Expanded graph
Include generated artifacts
Artifact - pbmc_test
pbmc_test_y:v0
Artifact - pbmc_test
pbmc_test_x:v0
Artifact - pbmc_train
pbmc_train_y:v0
Artifact - pbmc_model
pbmc_scanvi_model:v1
Artifact - pbmc_model
pbmc_scanvi_model:v0
Artifact - pbmc_train
pbmc_train_x:v0
Artifact - dataset
pbmc_genes:v1
Run - prediction_insights
azure-water-57
Run - pbmc_dataset_split
magic-wood-54
Run - prediction_insights
dandy-pine-51
Run - pbmc_training
comic-butterfly-55
Run - pbmc_training
confused-planet-33
Run - pbmc_dataset_split
avid-plant-32
React Flow
Model EvaluationWe trained models on  two separate genetic datasets from two tissue types: bone marrow and peripheral blood mononuclear cells (pbmc). The bone marrow dataset has about 8000 cells and 400 genes, with a little under half of them labeled into 14 diffferent cell types. The pbmc dataset is much larger with 20k cells and 20k+ genes, with only 200 of them labeled into four different cell types.  
﻿
Bone Marrow1
 
PBMC1
﻿
Model InterpretabilityWe can also inspect the latent variable z2z_2z2​ to see how the model separates the classes for the labeled data and if the model can group unlabeled points appropriately. We can make this comparison by leveraging WandB's 2D Projection Plot to project z2z_2z2​ down into two dimensions and then compare the labels to the model's predicted probability. 
﻿
﻿
﻿
 
Bone Marrow1
PBMC1
﻿
ReferencesPyro Scanvi Tutorial
Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models
Auto-encoding Variational Bayes
Black Box Variational Inference
Normalizing Flows: An Introduction and Review of Current Methods
Semi-Supervised Learning with Deep Generative Models
Auxiliary Deep Generative Models
A Python library for probabilistic analysis of single-cell omics data
﻿
﻿
Add a comment