Skip to main content

Wellplate v8 Data

Running Keypoint RCNN on the new wellplate_v8 data
Created on January 19|Last edited on January 20

Summary

We trained KeypointRCNN models on
  • wellplate_v8: 1k images with augmentation
  • wellplate_v8_no_flips: 1k images with augmentation, but no flips

For the wellplate_v8 data, though performance was ok, we saw a lot of outliers were keypoints were eather on top of each other, or 3 of them in one row on the irregular corner.





Best model: improved early stopping

It turns out that we can fix most of these outliers by using a "best of" modification to the early stopping algorithm, where we cache the best model before early stopping, and use that (instead of returning the latest one).
Performance:
TRAIN
Mean average pixel diff: 5.085927486419678
Mean maximum pixel diff: 12.067157745361328
Mean Physical error (mm): inf
Median Physical error (mm): 32.93441678012827
VAL:
Mean average pixel diff: 2.9431703090667725
Mean maximum pixel diff: 8.263350486755371
Mean Physical error (mm): 44.315040893933016
Median Physical error (mm): 29.97718126701645
For this approach, the worst error in the validation set is this image:


The next one being this one:


This is much more acceptable than the extreme outliers we saw before.

Testing on the labeled evaluation set

We can test the best model (wellplate_v8_no_flips + modified early stopping) on the labeled well-call evaluation set.
TEST:
Mean average pixel diff: 2.782594919204712
Mean maximum pixel diff: 6.7867937088012695
Mean Physical error (mm): 44.09989582111534
Median Physical error (mm): 26.305298842705497
Let's plot the max pixel error (max of 5 keypoints)


We see no performance difference on the hold out set, which is very different from what we observed on the wellplate_v7 data.


However, the train pixel error is a little higher than before. It looks like we've succesfully lowered the variance for the price of a small amount of bias!


Deploying to Narrator

Overall Precision and Call Rates
----------------------------------------------
call_bool 0.979167
precision_bool 0.635417
dtype: float64
-------------------------------
We can run the pipette autocalibration method to get the variance of the ground truth pipette tip location (using -md 50 and running on the full evaluation set):
stddev pipette tip coord: [[5.830946 6.30021475 8.11659979]]
Compare this to the previous model's result:
stddev pipette tip coord: [[15.53159538 11.41254873 17.88187785]]
This also looks like we're moving in the right direction!

Comments/Future directions:

  • we're still seeing a small amount (around 10) images in the training set where keypoints end up on top of each other. It would be interesting to see if further fine-tuning of the model, or more likely a bigger and more diverse training set, could resolve this issue
  • We need to start holding out a test set, as the early stopping method is introducing a significant amount of overfitting
TODO

Charts




Run set
3