Rewriting a Deep Generative Model: An Overview

In this report, we will explore the work presented in the paper "Rewriting a Deep Generative Model" by Bau et al. It shows a new way of looking at deep neural nets. .
Ayush Thakur

Introduction

In the words of the authors,

"Deep network training is a blind optimization procedure where programmers define objectives but not the solutions that emerge. In this paper, we ask if deep networks can be created in a different way. Can the rules in a network be directly rewritten?"

The Paper | The Code | Interactive Google Colab →

The blind optimization procedure is investigated quantitatively in the paper "Deep Ensembles: A Loss Landscape Perspective" by Fort et al. Sayak Paul, and I explored this paper in this report.

The usual recipe for creating a deep neural network is to train such a model on a massive dataset with a defined objective function. This takes a considerable amount of time and is expensive in most cases. The authors of Rewriting a Deep Generative Model proposes a method to create new deep networks by rewriting the rule of an existing pre-trained network as shown in figure 1. By doing so, they wish to enable novice users to easily modify and customize a model without the training time and computational cost of large-scale machine learning.

image.png

-> Figure 1: Rewriting GAN without training to remove the watermark, to add people, and to replace the tower with the tree. (Source) <-

They do so by setting up a new problem statement: manipulation of specific rules encoded by a deep generative model.

If you are unfamiliar with deep generative models, here is my take on the same.

So why is rewriting deep generative model useful?

Deep generative models such as a GAN can learn rich semantic and physical rules about a target distribution(faces, etc.). However, it usually takes weeks to achieve state of the art GAN on any dataset. If the target distribution changes by some amount, retraining a GAN would be a waste of resources. However, what if we directly change some trained GAN rules to reflect the target distribution change? Thus by rewriting a GAN:

An Overview of the Paper

The usual practice is to train a new model for every new version(slightly different target distribution) of the same dataset. By rewriting a specific rule without effecting other rules captured by the model, a lot of training and computing time is reduced.

How can we edit generative models? In the words of the authors,

"..we show how to generalize the idea of a linear associative memory to a nonlinear convolutional layer of a deep generator. Each layer stores latent rules as a set of key-value relationships over hidden features. Our constrained optimization aims to add or edit one specific rule within the associative memory while preserving the existing semantic relationships in the model as much as possible. We achieve it by directly measuring and manipulating the model's internal structure, without requiring any new training data."

To specify the rules, the authors have provided an easy to use interface. The video linked above has a clear explanation of using this interface.

Try out the interface in Google Colab $\rightarrow$

1. Change Rule with Minimal Collateral Damage

  1. We start with a pre-trained GAN. The authors have used StyleGAN and Progressive GAN pre-trained models trained on multiple datasets.

  2. With a given pre-trained generator(yes we are discarding discriminator), $G(z; θ_0)$, we can generate many(infinite) synthetic images. To generate an image $x_i$ a latent code(just a random vector sampled from a multivariate normal distribution) $z_i$ is required. Thus for a given $z_i$; $x_i = G(z_i; θ_0)$.

  3. The user wants to apply a change such that the new image is $x_{*i}$. For the generator, $G(z_i; θ_0)$ to produce $x_{*i}$ would not be possible since the changed image represents a target distribution that the GAN was not trained on. We thus need to find updated weights $θ_1$ such that $x_{*i} \approx G(z_i; θ_1)$. Interesting!

  4. Note that $θ$ represents the trainable weights or parameters in our GAN generator. The number of parameters is huge in standard SOTA GANs leading to easy overfitting. The aim here is to have a generalized rewritten GAN.

  5. The authors provided two modifications to the standard approach to tackle this manipulation of hidden features in the generator.

    • Instead of updating all of $θ$, the authors proposed to modify the weights of one layer. Thus all other layers are frozen/made non-trainable.
    • The objective function(say $L_1$ loss) is applied in the output feature space of that layer instead of the generator's output.
  6. So now, given a layer $L$, let $k$ be the feature output from the $L-1^{th}$ layer(frozen). We can write the feature output from the $L^{th}$ layer as $v = f(k; W_0)$. Thus the output from the target layer, $v$, is defined as a function $f$ with input as $k$ and pre-trained weights $W_0$.

  7. For a latent code $z_i$, the output from the first $L-1$ layer is $k_{*i}$. (Note here the use of $*i$ with $k$ does not mean a change in the rule specified by the user). Thus the output of the target layer $L$ would be $v_i = f(k_{*i}, W_0)$.

  8. For each target example $x_{*i}$, this is specified by the user manually, there is a feature change $v_{*i}$. The aim is to have a generator $G$ such that it can produce target example $x_{*i}$ with minimum interference with other behavior of the generator.

  9. This can be solved by minimizing a simple joint objective function.

    image.png

    Thus, we are updating the weights $W_0$, of target layer $L$ by minimizing smooth and constraint loss. The smooth loss function ensures that the output generated by the new rule is not far apart from the initial(pre-trained) one. The constraint loss ensues a specific rule to be modified or added.

    Also $|| . ||^2$ denotes the $L2$ loss.

2. The Analogy of Associative Memory

  1. Any matrix $W$ can be used as an associative memory that stores a set of key-value pairs {$(k_i, v_i)$}. This stored set of key-value pair can be retrieved by matrix multiplication such that,

    $v_i \approx Wk_i$

  2. The reason for approximate quality is because practically such a memory bank is not error-free. We can create error-free memory by having keys such that they form a set of mutually orthogonal unit-norm vectors. However, in the above formulation of associative memory, only $N$ orthogonal keys.

  3. The authors have considered the convolutional weights of the target layer $L$ as an associative memory. Thus instead of thinking the layer as a collection of convolutional filtering operations, **the layer is considered as a memory that associates memory keys to values.

  4. Each key is a single location feature vector. At the same time, the value is the output arrangement of pixels.

  5. To support more than $N$ nonorthogonal keys, {$k_i$}, instead of requiring exact equality that is $v_i = Wk_i$, error be minimized such that,

    $W_0 \overset{\Delta}{=} \underset{x}{\arg\min} \sum_{i} ||v_i - Wk_i||^2$

    This is a least-squares problem, the minimal solution of which can be found by solving for $W_0$ using the normal equation $W_0KK^T = VK^T$. Here $K$ and $V$ are simplified notations whose i-th column is the i-th key or value. More on least-square approximation here.

3. How to Update $W$ to Insert a new Value?

To add a new rule or modify an existing rule we will have to modify the pre-trained weights of layer $L$ which is given by $W_0$. The user will provide a single key to assign a new value such that $k_* \rightarrow v_*$. The modified weights matrix $W_1$ should be such that it satisfies two conditions:

This modified $W_1$ is given by,

$W_1 = \underset{x}{\arg\min} ||V - WK||^2$ subject to $v_* = W_1k_*$

Just like previous section point 5, this is a least square problem, however this time it's a constrained least-square(CLS) problem which can be solved exactly as:

$W_1KK^T = VK^T + \Lambda k_*^T$ where $\Lambda \in \R^n$

From the previous section point 5, we can replace $VK^T$ at $W_0KK^T$. The modified equation is,

$W_1KK^T = W_0KK^T + \Lambda k_*^T$ or,

$W_1 = W_0 + \Lambda (C^{-1}k_*)^T$ where $C \overset{\Delta}{=} KK^T$

There are two interesting points of the last equation derived,

Note: $C$ is a model constant which can be pre-computed and cached. Only $\Lambda$ which specifies the magnitude of change depends on target value $v_*$.

4. Generalizing to a Non-Linear layer

So far, the discussion was done, assuming a linear setting. However, the convolutional layer of a generator or, in general, have several non-linear components like biases, activation(ReLU), normalization, style module, etc.

The authors have generalized the idea of associative memory such that:

The authors have expanded this idea to insert the desired change for more than one key at once.

User Interface

If you are frustrated with all this mathematics, the good news is that the authors were kind enough to build us a user interface to edit a GAN easily. The video linked above is an excellent demonstration of this user interface tool by the authors.

image.png

-> Figure 3: The copy-paste-context interface for rewriting a model. (Source) <-

This interface provides a three-step rewriting process:

Try out the interface in Google Colab $\rightarrow$

image.png

-> Figure 4: User interface as seen after running the linked Colab notebook <-

Putting it all Together and Ending Note

The authors have used the user interface and the idea behind rewriting generative models to showcase multiple use cases. Some of them are:

In the words of the authors,

"Machine learning requires data, so how can we create effective models for data that do not yet exist? Thanks to the rich internal structure of recent GANs, in this paper, we have found it feasible to create such models by rewriting the rules within existing networks."

This is truly a novel bit.

Thank you for sticking so far with this report. I wrote this report with the intent to provide an easy start with the paper. If something is still not clear or I made some mistake, please provide your feedback in the comment section.

Finally, this report showcases the rich writing experience one can get while writing such W&B reports. I hope you enjoyed reading it. Thank you.