Mask-RCNN for x-ray images
Improving the mask-RCNN architecture to have better performances on x-ray images
Created on August 8|Last edited on August 11
Comment
Summary of mask-RCNN
Architecture

The mask-RCNN is composed of 3 main parts :
- The FPN (Feature Pyramid Network): It produces feature maps from input images
- The RPN (Region Proposal Network): It extracts regions of interest (ROI) from the feature maps
- The ROI Align: It draws bonding boxes, masks and classifies the different ROIs
Losses
Improvements made on mask-RCNN
Changed NMS to soft-NMS
Changed IoU to DIoU
Experiment
Overview
Training
Run set
4
We've trained our models for 3 epochs over the training subset of PIDray, we can see the losses starting to stabilize after 2 or 3 epochs, but to know how long is enough we need to look at the validation losses
Run set
4
Here, if we look at "val_losses" which is just the sum of all losses, we can see that the models tends to overfit rather quickly, the best losses are often after 1 or 2 epochs.
That doesn't mean there are no point in training them for longer though. Since we don't have the dataset that will be use in production, we didn't do a lot of hyper parameters tuning. Maybe changing a hyper parameter like the learning rate would lead to better performances at the cost of more epochs needed.
Finally, the losses and validation losses are foreshadowing our conclusions for the next sections as we're seeing that the models using soft-NMS (brown and beige) and the one that don't (green and cyan) have different tendencies in loss.
Why is that ? Well, soft-NMS adds a lot more boxes during the training, since boxes less filtered. That's the point of this algorithm, keeping boxes that could be overlapping objects, but that also means having more wrong guess during training time and thus worst losses when it comes to boxes precision. On the other hand, when it comes to box recall, soft-NMS is better and thus having better losses.
Overall soft-NMS give worst losses, but what matters to us is the final metrics that we're about to discuss
IoU scores & classification accuracy
Run set
4
First of all, what soft-NMS does not change ? The soft-NMS algorithm only impacts what boxes are selected, that means that it has no impact on classification and on mask, so the IoU score and accuracy score don't change much between the different models.
Also a note about IoU threshold vs IoU scores, on the IoU scores graph we can see that each run has 3 plot with 3 IoU values (0.5, 0.6 and 0.7) those values are the IoU threshold, how much IoU do we need between a box predicted and the ground truth to assert they're the same. They are used in all the metrics computing and their point are to match the predictions of the models to the ground truth. The higher, the more exigent we are to the model.
Precision/Recall curves
Run set
4
Here we can see the Precision/Recall curves, they show the tradeoff between the recall (detecting all the objects) and the precision (all detection are indeed objects). Where you want to be in the curve really depend on the problem, do you absolutely need to detect all objects ? Or do you need to be sure about what you're detecting ?
In our problem, it seems that the recall is more important than the precision, for example, if we used the soft-NMS DIoU model as is, we could have a recall of 90% with about half of detection being false positives.
To compare those curves overall, since they are in 2 dimensions, we often use the mAP (mean average precision) metric which can be seen as the area under the curve. With a IoU threshold of 0.5, we have a mAP of about 0.83 with soft-NMS, whereas we have a mAP of about 0.66 without
Training time
The soft-NMS algorithm comes with a tradeoff, the computing time. Since we keep all boxes instead of deleting them with nms, we have a lot more computing to do meaning a longer training time and inference time.
The models with nms only take about 2h30 to do the 3 epochs and computing the metrics, while models using soft-NMS take about 8h30
Run set
4
Add a comment