Hemm: Holistic Evaluation of Multi-modal Generative Models
Created on October 2|Last edited on October 3
Comment
Introduction
Recent strides in text-to-image generation models have shown their potential to create a wide range of high-fidelity images from natural language prompts. However, the challenge of effectively arranging objects with diverse attributes and relationships into a coherent scene remains. To address this, we present Hemm, a library designed to comprehensively evaluate text-to-image generation models for prompt comprehension.
Hemm is built using the powerful logging and tracing capabilities of Weave and Weights & Biases to perform a comprehensive apples-to-apples evaluation of a text-to-image generation model based on several SoTA metrics for prompt comprehension. Hemm is based on the metrics proposed in the following projects:
Quickstart
Installation
Next, you can clone and install Hemm using the following commands.
git clone https://github.com/wandb/Hemmcd Hemmpip install -e ".[core]"
Publish a Weave Dataset for Evaluation
First, you need to publish your evaluation dataset to Weave. Check out this tutorial that shows you how to publish a dataset on your project.
Exploring a Weave Dataset on the UI
1
Running the Evaluations
Once you have a dataset on your Weave project, you can evaluate a text-to-image generation model on the metrics.
import wandbimport weavefrom hemm.eval_pipelines import BaseDiffusionModel, EvaluationPipelinefrom hemm.metrics.prompt_alignment import CLIPImageQualityScoreMetric, CLIPScoreMetric# Initialize Weave and WandBwandb.init(project="image-quality-leaderboard", job_type="evaluation")weave.init(project_name="image-quality-leaderboard")# Initialize the diffusion model to be evaluated as a `weave.Model` using `BaseWeaveModel`# The `BaseDiffusionModel` class uses a `diffusers.DiffusionPipeline` under the hood.# You can write your own model `weave.Model` if your model is not diffusers compatible.model = BaseDiffusionModel(diffusion_model_name_or_path="CompVis/stable-diffusion-v1-4")# Add the model to the evaluation pipelineevaluation_pipeline = EvaluationPipeline(model=model)# Add PSNR Metric to the evaluation pipelinepsnr_metric = PSNRMetric(image_size=evaluation_pipeline.image_size)evaluation_pipeline.add_metric(psnr_metric)# Add SSIM Metric to the evaluation pipelinessim_metric = SSIMMetric(image_size=evaluation_pipeline.image_size)evaluation_pipeline.add_metric(ssim_metric)# Add LPIPS Metric to the evaluation pipelinelpips_metric = LPIPSMetric(image_size=evaluation_pipeline.image_size)evaluation_pipeline.add_metric(lpips_metric)# Get the Weave dataset referencedataset = weave.ref("COCO:v0").get()# Evaluate!evaluation_pipeline(dataset=dataset)
The Weave Evaluation UI
1
Hemm Leaderboards
Add a comment