Wellplate v8 Data
Running Keypoint RCNN on the new wellplate_v8 data
Created on January 19|Last edited on January 20
Comment
Summary
We trained KeypointRCNN models on
- wellplate_v8: 1k images with augmentation
- wellplate_v8_no_flips: 1k images with augmentation, but no flips
For the wellplate_v8 data, though performance was ok, we saw a lot of outliers were keypoints were eather on top of each other, or 3 of them in one row on the irregular corner.


Best model: improved early stopping
It turns out that we can fix most of these outliers by using a "best of" modification to the early stopping algorithm, where we cache the best model before early stopping, and use that (instead of returning the latest one).
Performance:
TRAINMean average pixel diff: 5.085927486419678Mean maximum pixel diff: 12.067157745361328Mean Physical error (mm): infMedian Physical error (mm): 32.93441678012827VAL:Mean average pixel diff: 2.9431703090667725Mean maximum pixel diff: 8.263350486755371Mean Physical error (mm): 44.315040893933016Median Physical error (mm): 29.97718126701645
For this approach, the worst error in the validation set is this image:

The next one being this one:

This is much more acceptable than the extreme outliers we saw before.
Testing on the labeled evaluation set
We can test the best model (wellplate_v8_no_flips + modified early stopping) on the labeled well-call evaluation set.
TEST:Mean average pixel diff: 2.782594919204712Mean maximum pixel diff: 6.7867937088012695Mean Physical error (mm): 44.09989582111534Median Physical error (mm): 26.305298842705497
Let's plot the max pixel error (max of 5 keypoints)

We see no performance difference on the hold out set, which is very different from what we observed on the wellplate_v7 data.

However, the train pixel error is a little higher than before. It looks like we've succesfully lowered the variance for the price of a small amount of bias!
Deploying to Narrator
Overall Precision and Call Rates----------------------------------------------call_bool 0.979167precision_bool 0.635417dtype: float64-------------------------------
We can run the pipette autocalibration method to get the variance of the ground truth pipette tip location (using -md 50 and running on the full evaluation set):
stddev pipette tip coord: [[5.830946 6.30021475 8.11659979]]
Compare this to the previous model's result:
stddev pipette tip coord: [[15.53159538 11.41254873 17.88187785]]
This also looks like we're moving in the right direction!
Comments/Future directions:
- we're still seeing a small amount (around 10) images in the training set where keypoints end up on top of each other. It would be interesting to see if further fine-tuning of the model, or more likely a bigger and more diverse training set, could resolve this issue
- We need to start holding out a test set, as the early stopping method is introducing a significant amount of overfitting
Links to Models/Datasets:
TODO
Charts
Add a comment