Skip to main content

Fine-tuning CLIP on RSICD

Created on July 22|Last edited on August 3

Impact of image augmentations

Inroduction of image augmentations helps to reduce the overfitting significantly. The set of image augmentation used for this run is:
One can further explore possible image augmentation strategies.

20406080100Step3.844.24.4
020406080100Step11.522.53
Run set
2


Impact of text augmentations

The RSICD dataset contains 5 captions per image. For many images, it was 1 caption repeated 5 times. We used back translations [ref] to make captions more diverse. Specifically, MarianMT model [links] was used to generate augmented caption.


Run set
2


Authors