Skip to main content

Adobe Researchers Improve CLIP Grammar Understanding And Detail With New Dataset FineCapEval

Researchers at Adobe have come up with a new method to optimize CLIP's grammar understanding, and train a model with greater fine-detail captioning with new dataset.
Created on July 13|Last edited on July 13
AI Researchers at Adobe are showing off eleven research papers at the computing conference NAACL 2022. Among those papers, though unfortunately not being presented at the main conference, is a findings paper entitled Fine-grained Image Captioning with CLIP Reward which might be of interest to those keeping up with the recent and continuous cavalcade of text-to-image generation models like DALL·E and Craiyon.

This research paper introduces a method for fine-tuning the grammatical understanding of CLIP models. The researchers found that, when generating captions for images, the results would lack fine-grain detail and sometimes devolve into grammatical errors. Their new improvements to the CLIP model fix all this, and they go on to develop a model using their improved CLIP score as the reward during a reinforcement learning process which generates higher quality and more detailed captions compared to other models.
This paper also comes with the release of a new dataset called FineCapEval, containing 1000 images each with 5 captions. The captions are also broken down into detail-specific areas, like background and relationship.
This dataset is useful for assisting the creation of models which aim to caption images with higher levels of detail than many models currently do. Compared to other image datasets, FineCapEval has a big focus on fine-grain detail.
Everything described in the paper, including pre-trained models and the FineCapEval dataset is open-source and available at the project GitHub repository.
Code examples and a Google Colab are also provided.
Additionally, interactive demo webpages are available as a Hugging Face Gradio demo and on a Replicate page. The Replicate demo allows for more options and includes examples generated by various models to show off the team's improved model compared to others.

Find out more

Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.