EditGAN: High-Precision Semantic Image Editing
Robust and high-precision semantic image editing in real-time
Created on April 1|Last edited on April 9
Comment
Introduction
In recent times, Generative Adversarial Networks (GANs) have been adopted widely for image editing applications, helping streamline the workflow of photographers and content creators and enabling new levels of creativity and digital artistry. GANs have found their way into consumer software in the form of neural photo editing filters and the deep learning research community is actively developing further techniques.
However! Most GAN-based image editing methods have certain drawbacks. They:
- require large-scale datasets with semantic segmentation annotations for training
- only provide high-level control
- simply interpolate between different images
Researchers at NVIDIA Toronto Artificial Intelligence Lab propose a novel approach called EditGAN for high-quality, high-precision semantic image editing that attempts to tackle the drawbacks of existing GAN-based image editing approaches. EditGAN allows users to edit images by modifying their highly detailed part segmentation masks, such as drawing a new mask for the headlight of a car! Check it out:
EditGAN in Action
16
NB: This post was written as a W&B Report. If you’re running ML experiments, you can use W&B to automatically track all of your results and logs, and with Reports, you can easily share your analysis with your team. Here’s a snippet of pseudocode to get started:
wandb.init(project='editGAN', config={'lr': 0.01, 'nepochs': 10})for _ in range(10):wandb.log({'loss': loss})
Existing Approaches––and Their Drawbacks
Most GAN-based image editing methods fall into a few categories:
- Some methods such as MaskGAN, Cascade EF-GAN, SEAN, and DeepFaceDrawing rely on GANs conditioning on class labels or pixel-wise semantic segmentation annotations where different conditionings lead to modifications in the output, while others use auxiliary attribute classifiers to guide synthesis and edit images. However such methods have a few drawbacks:
- Training these conditional GANs or external classifiers requires large annotated datasets. Therefore, these methods are currently limited to image types for which large annotated datasets are available (like portraits).
- Furthermore, even if annotations are available, most techniques offer only limited editing control, since these annotations usually consist only of high-level global attributes or relatively coarse pixel-wise segmentation.
- Methods such as Editing in Style, SEAN, and VOGUE focus on mixing and interpolating features from different images, thereby requiring reference images as editing targets and usually also not offering fine control.
- Methods such as InterfaceGAN, GAN Dissection, and GANSpace that focus on carefully analyzing and dissecting the latent spaces of the GAN aimed at finding disentangled latent variables suitable for editing or methods focused on controlling the network parameters of the GANs are not user-friendly and cannot provide users with the desired level of flexibility and details.
The Novelty of EditGAN
EditGAN is a novel GAN-based image editing framework that enables high-precision semantic image editing by allowing users to modify detailed object part segmentations.
The authors of EditGAN build on DatasetGAN that jointly models both images and their semantic segmentations based on the same underlying latent code and requires as few as 16 labeled examples. This allows EditGAN to scale to many object classes and choices of part labels without depending on large annotated datasets.
The image manipulation or editing is done by modifying the segmentation mask according to the desired edit and optimizing the latent code to be consistent with the new segmentation mask, thus effectively changing the RGB image. In order to achieve better efficiency during the editing process, EditGAN learns editing vectors in latent space that realize the edits. These editing vectors can be directly applied to other images, without any or only a few additional optimization steps.
EditGAN with Pre-determined Editing Vectors
16
Significant Advantages of EditGAN
- EditGAN offers a very high-precision editing experience without having to rely on carefully analyzing and dissecting the latent spaces like InterfaceGAN, GAN Dissection, and GANSpace.
- It requires very little annotated training data and does not rely on external classifiers unlike MaskGAN, Cascade EF-GAN, SEAN, and DeepFaceDrawing.
- EditGAN can be run interactively in real-time.
- It allows for straightforward compositionality of multiple edits by learning editing vectors in latent space.
- It works on real embedded, GAN-generated, and even out-of-domain images.
Okay. So How Does EditGAN Work?
Experiments
The authors extensively evaluate EditGAN on images across four different categories:
- Car Images with a spatial resolution (384, 512)
- Bird Images with a spatial resolution (512, 512)
- Cat Images with a spatial resolution (256, 256)
- Face Images with a spatial resolution (1024, 1024)
Qualitative Results
Examples of Segmentation-driven Edits with EditGAN
The following results are based on editing with editing vectors and 30 steps of self-supervised refinement. They demonstrate that the editing operations preserve high image quality and are well disentangled for all classes.
Examples of segmentation-driven edits with EditGAN
19
Note that several of theses editing operations generate plausible manipulated images unlike those appearing in the GAN training data. For example, the training data does not include cats with overly large eyes or ears. Nevertheless, EditGAN achieves such edits in a high-quality manner.
💡
Examples of Combining Multiple Edits
The following results demonstrate combinations of multiple edits with editing vectors and 30 steps of self-supervised refinement.
Examples of Combining Multiple Edits on Face Images
12
Examples of High-Precision Editing
The following results demonstrate that using EditGAN, we can perform extremely high-precision edits, such as rotating a car’s wheel spoke or dilating pupils. EditGAN can be used to edit semantic parts of objects that consist of only a few pixels.
Examples of High-Precision Editing
15
Large-Scale Modifications
EditGAN can also be used to perform large-scale modifications such as, removing the entire roof of a car or converting it to a station wagon-like vehicle, simply by modifying the segmentation mask accordingly and optimizing. The following panel demonstrates this operation.
Large-scale Modifications
13
Out-of-Domain Results
The authors demonstrate the generalization capability of EditGAN to out-of-domain data on the MetFaces Dataset. The authors use the EditGAN model trained on the FFHQ Dataset and create the editing vectors using in-domain real faces. The out-of-domain MetFaces portraits are then embedded with 100 steps of optimization and apply the editing vectors with 30 steps of self-supervised refinement. As we can see based on the results from the panel shown below, the editing operations seamlessly translate even to such far out-of-domain examples.
Combinations of Multiple Edits on Out-of-Domain Images
2
The MetFaces Dataset is an image dataset of human faces extracted from works of art, originally created as part of the paper Training Generative Adversarial Networks with Limited Data.
💡
Challenging Editing Operations
The authors demonstrate challenging editing operations with EditGAN where semantically related parts are disentangled. The results presented in the following panel correspond to pure optimization-based editing.
Challenging Editing Operations
1
Beak Size Editing
The following panel demonstrates a specific application of editing the beak size of birds. As shown below, the editing vectors learned from just 2 pairs of images and segmentation masks are used to achieve beak size editing.
Beak Size Editing
1
License Plate Removal
The following panel demonstrates another specific application of editing removing the license plate from cars. As shown below, the editing vectors learned from just 2 pairs of images and segmentation masks are used to achieve beak size editing.
License Plate Removal
1
Quantitative Results
In order to quantitatively measure EditGAN’s image editing capabilities, the smile edit benchmark introduced by MaskGAN is used. Faces with neutral expressions are converted into smiling faces and performance is measured by three metrics:
- Semantic Correctness: Using a pre-trained smile attribute classifier, it is measured whether the faces show smiling expressions after editing.
- Distribution-level Image Quality: Fréchet Inception Distance and Kernel Inception Distance are calculated between 400 edited test images and the CelebA-HD test dataset.
- Identity Preservation: Using the pre-trained ArcFace feature extraction network, it is measured whether the identity of the subjects is maintained when applying the edit which specifically is the cosine-similarity between original and edited images.
Quantitative Results
5
Higher Attribute-accuracy and ID Score and lower FID and KID scores correspond to better and robust image generation.
💡
Key Impacts of EditGAN
Advantages
- Where previous generative modeling-based image editing methods offer only limited high-level editing capabilities, EditGAN provides users unprecedented high-precision semantic editing possibilities.
- The techniques proposed in the paper can be used for artistic purposes and creative expression and benefits designers, photographers, and content creators. Such type of AI-driven image editing tools can potentially democratize high-quality image editing.
- On a larger scale, the ability to synthesize data with specific attributes can be leveraged in training and fine-tuning machine learning models.
Disadvantages
- The high-precision photo editing offered by EditGAN can potentially be used for advanced photo manipulation for nefarious purposes. The recent progress of generative models and AI-driven photo editing has profound implications on image authenticity and beyond. As one potential way to tackle these challenges, the research community has already been making significant efforts for developing methods for automatically validating real images and detecting manipulated or fake images such as
- Generative models like EditGAN are usually only as good as the data they were trained on. Therefore, biases in the underlying datasets are still present in the synthesized images and preserved even when applying our proposed editing methods. It is therefore important to be aware of such biases in the underlying data and counteract them. By actively collecting more representative data or by using bias correction methods this problem can be tackled to some extent as shown by the following papers:
Conclusion
- We discuss the paper EditGAN: High-Precision Semantic Image Editing, which proposes EditGAN, a novel method for high-precision, high-quality semantic image editing.
- EditGAN relies on a GAN that jointly models RGB images and their pixel-wise semantic segmentation maps and that requires only very few annotated data for training.
- Editing is achieved by performing optimization in latent space while conditioning on edited segmentation masks.
- This optimization can be amortized into editing vectors in latent space, which can be applied to other images directly, allowing for real-time interactive editing without any or only a little further optimization.
- The authors demonstrate a broad variety of editing operations on different kinds of images, achieving an unprecedented level of flexibility and freedom in terms of editing while preserving high image quality.
Similar Posts
PoE-GAN: Generating Images from Multi-Modal Inputs
PoE-GAN is a recent, fascinating paper where the authors generate images from multiple inputs like text, style, segmentation, and sketch. We dig into the architecture, the underlying math, and of course, generate some images along the way.
JoJoGAN: One Shot Face Stylization with W&B and Gradio
This report showcases JoJoGAN: One Shot Face Stylization for fine-tuning a pretrained stylegan from faces to stylized art. Track experiments on wandb and use the live demo with Gradio. Try the live demo in your browser!
How to Evaluate GANs using Frechet Inception Distance (FID)
In this article, we will briefly discuss the details of GAN evaluation and how to implement the Frechet Inception Distance (FID) evaluation pipeline.
Pivotal Tuning for Latent-Based Editing of Real Images
An interesting method has come up that edits facial attributes, without allowing the loss of tattoos and pimples. This report will be a journey through this paper.
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.