Skip to main content

EditGAN: High-Precision Semantic Image Editing

Robust and high-precision semantic image editing in real-time
Created on April 1|Last edited on April 9

Introduction

In recent times, Generative Adversarial Networks (GANs) have been adopted widely for image editing applications, helping streamline the workflow of photographers and content creators and enabling new levels of creativity and digital artistry. GANs have found their way into consumer software in the form of neural photo editing filters and the deep learning research community is actively developing further techniques.
However! Most GAN-based image editing methods have certain drawbacks. They:
  • require large-scale datasets with semantic segmentation annotations for training
  • only provide high-level control
  • simply interpolate between different images
Researchers at NVIDIA Toronto Artificial Intelligence Lab propose a novel approach called EditGAN for high-quality, high-precision semantic image editing that attempts to tackle the drawbacks of existing GAN-based image editing approaches. EditGAN allows users to edit images by modifying their highly detailed part segmentation masks, such as drawing a new mask for the headlight of a car! Check it out:

EditGAN in Action
16



NB: This post was written as a W&B Report. If you’re running ML experiments, you can use W&B to automatically track all of your results and logs, and with Reports, you can easily share your analysis with your team. Here’s a snippet of pseudocode to get started:
wandb.init(project='editGAN', config={'lr': 0.01, 'nepochs': 10})
for _ in range(10):
wandb.log({'loss': loss})

Existing Approaches––and Their Drawbacks

Most GAN-based image editing methods fall into a few categories:
  • Some methods such as MaskGAN, Cascade EF-GAN, SEAN, and DeepFaceDrawing rely on GANs conditioning on class labels or pixel-wise semantic segmentation annotations where different conditionings lead to modifications in the output, while others use auxiliary attribute classifiers to guide synthesis and edit images. However such methods have a few drawbacks:
    • Training these conditional GANs or external classifiers requires large annotated datasets. Therefore, these methods are currently limited to image types for which large annotated datasets are available (like portraits).
    • Furthermore, even if annotations are available, most techniques offer only limited editing control, since these annotations usually consist only of high-level global attributes or relatively coarse pixel-wise segmentation.
  • Methods such as Editing in Style, SEAN, and VOGUE focus on mixing and interpolating features from different images, thereby requiring reference images as editing targets and usually also not offering fine control.
  • Methods such as InterfaceGAN, GAN Dissection, and GANSpace that focus on carefully analyzing and dissecting the latent spaces of the GAN aimed at finding disentangled latent variables suitable for editing or methods focused on controlling the network parameters of the GANs are not user-friendly and cannot provide users with the desired level of flexibility and details.

The Novelty of EditGAN

EditGAN is a novel GAN-based image editing framework that enables high-precision semantic image editing by allowing users to modify detailed object part segmentations.
The authors of EditGAN build on DatasetGAN that jointly models both images and their semantic segmentations based on the same underlying latent code and requires as few as 16 labeled examples. This allows EditGAN to scale to many object classes and choices of part labels without depending on large annotated datasets.
The image manipulation or editing is done by modifying the segmentation mask according to the desired edit and optimizing the latent code to be consistent with the new segmentation mask, thus effectively changing the RGB image. In order to achieve better efficiency during the editing process, EditGAN learns editing vectors in latent space that realize the edits. These editing vectors can be directly applied to other images, without any or only a few additional optimization steps.

EditGAN with Pre-determined Editing Vectors
16


Significant Advantages of EditGAN

  • EditGAN offers a very high-precision editing experience without having to rely on carefully analyzing and dissecting the latent spaces like InterfaceGAN, GAN Dissection, and GANSpace.
  • It requires very little annotated training data and does not rely on external classifiers unlike MaskGAN, Cascade EF-GAN, SEAN, and DeepFaceDrawing.
  • EditGAN can be run interactively in real-time.
  • It allows for straightforward compositionality of multiple edits by learning editing vectors in latent space.
  • It works on real embedded, GAN-generated, and even out-of-domain images.



Okay. So How Does EditGAN Work?

Experiments

The authors extensively evaluate EditGAN on images across four different categories:
  • Car Images with a spatial resolution (384, 512)
  • Bird Images with a spatial resolution (512, 512)
  • Cat Images with a spatial resolution (256, 256)
  • Face Images with a spatial resolution (1024, 1024)

Qualitative Results

Examples of Segmentation-driven Edits with EditGAN

The following results are based on editing with editing vectors and 30 steps of self-supervised refinement. They demonstrate that the editing operations preserve high image quality and are well disentangled for all classes.

Examples of segmentation-driven edits with EditGAN
19

Note that several of theses editing operations generate plausible manipulated images unlike those appearing in the GAN training data. For example, the training data does not include cats with overly large eyes or ears. Nevertheless, EditGAN achieves such edits in a high-quality manner.
💡

Examples of Combining Multiple Edits

The following results demonstrate combinations of multiple edits with editing vectors and 30 steps of self-supervised refinement.

Examples of Combining Multiple Edits on Face Images
12


Examples of High-Precision Editing

The following results demonstrate that using EditGAN, we can perform extremely high-precision edits, such as rotating a car’s wheel spoke or dilating pupils. EditGAN can be used to edit semantic parts of objects that consist of only a few pixels.

Examples of High-Precision Editing
15


Large-Scale Modifications

EditGAN can also be used to perform large-scale modifications such as, removing the entire roof of a car or converting it to a station wagon-like vehicle, simply by modifying the segmentation mask accordingly and optimizing. The following panel demonstrates this operation.

Large-scale Modifications
13


Out-of-Domain Results

The authors demonstrate the generalization capability of EditGAN to out-of-domain data on the MetFaces Dataset. The authors use the EditGAN model trained on the FFHQ Dataset and create the editing vectors using in-domain real faces. The out-of-domain MetFaces portraits are then embedded with 100 steps of optimization and apply the editing vectors with 30 steps of self-supervised refinement. As we can see based on the results from the panel shown below, the editing operations seamlessly translate even to such far out-of-domain examples.

Combinations of Multiple Edits on Out-of-Domain Images
2

The MetFaces Dataset is an image dataset of human faces extracted from works of art, originally created as part of the paper Training Generative Adversarial Networks with Limited Data.
💡

Challenging Editing Operations

The authors demonstrate challenging editing operations with EditGAN where semantically related parts are disentangled. The results presented in the following panel correspond to pure optimization-based editing.

Challenging Editing Operations
1


Beak Size Editing

The following panel demonstrates a specific application of editing the beak size of birds. As shown below, the editing vectors learned from just 2 pairs of images and segmentation masks are used to achieve beak size editing.

Beak Size Editing
1


License Plate Removal

The following panel demonstrates another specific application of editing removing the license plate from cars. As shown below, the editing vectors learned from just 2 pairs of images and segmentation masks are used to achieve beak size editing.

License Plate Removal
1


Quantitative Results

In order to quantitatively measure EditGAN’s image editing capabilities, the smile edit benchmark introduced by MaskGAN is used. Faces with neutral expressions are converted into smiling faces and performance is measured by three metrics:
  • Semantic Correctness: Using a pre-trained smile attribute classifier, it is measured whether the faces show smiling expressions after editing.
  • Distribution-level Image Quality: Fréchet Inception Distance and Kernel Inception Distance are calculated between 400 edited test images and the CelebA-HD test dataset.
  • Identity Preservation: Using the pre-trained ArcFace feature extraction network, it is measured whether the identity of the subjects is maintained when applying the edit which specifically is the cosine-similarity between original and edited images.

Quantitative Results
5

Higher Attribute-accuracy and ID Score and lower FID and KID scores correspond to better and robust image generation.
💡




Key Impacts of EditGAN

Advantages

  • Where previous generative modeling-based image editing methods offer only limited high-level editing capabilities, EditGAN provides users unprecedented high-precision semantic editing possibilities.
  • The techniques proposed in the paper can be used for artistic purposes and creative expression and benefits designers, photographers, and content creators. Such type of AI-driven image editing tools can potentially democratize high-quality image editing.
  • On a larger scale, the ability to synthesize data with specific attributes can be leveraged in training and fine-tuning machine learning models.

Disadvantages




Conclusion

  • We discuss the paper EditGAN: High-Precision Semantic Image Editing, which proposes EditGAN, a novel method for high-precision, high-quality semantic image editing.
  • EditGAN relies on a GAN that jointly models RGB images and their pixel-wise semantic segmentation maps and that requires only very few annotated data for training.
  • Editing is achieved by performing optimization in latent space while conditioning on edited segmentation masks.
  • This optimization can be amortized into editing vectors in latent space, which can be applied to other images directly, allowing for real-time interactive editing without any or only a little further optimization.
  • The authors demonstrate a broad variety of editing operations on different kinds of images, achieving an unprecedented level of flexibility and freedom in terms of editing while preserving high image quality.



Similar Posts


Iterate on AI agents and models faster. Try Weights & Biases today.