Skip to main content

Rewriting a Deep Generative Model: An Overview

In this article, we will explore the work presented in the paper "Rewriting a Deep Generative Model" by Bau et al. It shows a new way of looking at deep neural networks.
Created on September 13|Last edited on November 22
In the words of the authors:
"Deep network training is a blind optimization procedure where programmers define objectives but not the solutions that emerge. In this paper, we ask if deep networks can be created in a different way. Can the rules in a network be directly rewritten?"

The Paper | The Code | Interactive Google Colab →

The blind optimization procedure is investigated quantitatively in the paper "Deep Ensembles: A Loss Landscape Perspective" by Fort et al. Sayak Paul, and I explored this paper in this article.
The usual recipe for creating a deep neural network is to train such a model on a massive dataset with a defined objective function. This takes a considerable amount of time and is expensive in most cases. The authors of Rewriting a Deep Generative Model propose a method to create new deep networks by rewriting the rule of an existing pre-trained network as shown in figure 1. By doing so, they wish to enable novice users to easily modify and customize a model without the training time and computational cost of large-scale machine learning.


-> Figure 1: Rewriting GAN without training to remove the watermark, to add people, and to replace the tower with the tree. (Source) <-
They do so by setting up a new problem statement: manipulation of specific rules encoded by a deep generative model.
If you are unfamiliar with deep generative models, here is my take on the same.

Table of Contents



Why Is Rewriting Deep Generative Model Useful?

Deep generative models such as a GAN can learn rich semantic and physical rules about a target distribution (faces, etc.). However, it usually takes weeks to achieve state-of-the-art GAN on any dataset.
If the target distribution changes by some amount, retraining a GAN would be a waste of resources. However, what if we directly change some trained GAN rules to reflect the target distribution change? Thus by rewriting a GAN:
  • We can build a new model without retraining, which is a more involved task.
  • From the perspective of demystifying deep neural nets, the approach to edit a model gives new insight about the model and how semantic features are captured.
  • It can also provide some insight into the generalization of deep models to unseen scenarios.
  • Unlike conventional image editing tools where the desired change is applied on a single image, by editing a GAN, one can apply the edit on every image generated.
  • Using this tool, one can build new generative models without domain expertise, training time, and computational expense.


An Overview of the Paper

The usual practice is to train a new model for every new version(slightly different target distribution) of the same dataset. By rewriting a specific rule without effecting other rules captured by the model, a lot of training and computing time is reduced.
How can we edit generative models? In the words of the authors,
"..we show how to generalize the idea of a linear associative memory to a nonlinear convolutional layer of a deep generator. Each layer stores latent rules as a set of key-value relationships over hidden features. Our constrained optimization aims to add or edit one specific rule within the associative memory while preserving the existing semantic relationships in the model as much as possible. We achieve it by directly measuring and manipulating the model's internal structure, without requiring any new training data."
To specify the rules, the authors have provided an easy to use interface. The video linked above has a clear explanation of using this interface.

Try out the interface in Google Colab \rightarrow

1. Change Rule With Minimal Collateral Damage

  1. We start with a pre-trained GAN. The authors have used StyleGAN and Progressive GAN pre-trained models trained on multiple datasets.
  2. With a given pre-trained generator(yes we are discarding discriminator), G(z;θ0)G(z; θ_0), we can generate many(infinite) synthetic images. To generate an image xix_i a latent code(just a random vector sampled from a multivariate normal distribution) ziz_i is required. Thus for a given ziz_i; xi=G(zi;θ0)x_i = G(z_i; θ_0).
  3. The user wants to apply a change such that the new image is xix_{*i}. For the generator, G(zi;θ0)G(z_i; θ_0) to produce xix_{*i} would not be possible since the changed image represents a target distribution that the GAN was not trained on. We thus need to find updated weights θ1θ_1 such that xiG(zi;θ1)x_{*i} \approx G(z_i; θ_1). Interesting!
  4. Note that θθ represents the trainable weights or parameters in our GAN generator. The number of parameters is huge in standard SOTA GANs leading to easy overfitting. The aim here is to have a generalized rewritten GAN.
  5. The authors provided two modifications to the standard approach to tackle this manipulation of hidden features in the generator.
    • Instead of updating all of θθ, the authors proposed to modify the weights of one layer. Thus all other layers are frozen/made non-trainable.
    • The objective function(say L1L_1 loss) is applied in the output feature space of that layer instead of the generator's output.
  6. So now, given a layer LL, let kk be the feature output from the L1thL-1^{th} layer(frozen). We can write the feature output from the LthL^{th} layer as v=f(k;W0)v = f(k; W_0). Thus the output from the target layer, vv, is defined as a function ff with input as kk and pre-trained weights W0W_0.
  7. For a latent code ziz_i, the output from the first L1L-1 layer is kik_{*i}. (Note here the use of i*i with kk does not mean a change in the rule specified by the user). Thus the output of the target layer LL would be vi=f(ki,W0)v_i = f(k_{*i}, W_0).
  8. For each target example xix_{*i}, this is specified by the user manually, there is a feature change viv_{*i}. The aim is to have a generator GG such that it can produce target example xix_{*i} with minimum interference with other behavior of the generator.
  9. This can be solved by minimizing a simple joint objective function.
  10. 
    
  11. 
  12. Thus, we are updating the weights W0W_0, of target layer LL by minimizing smooth and constraint loss. The smooth loss function ensures that the output generated by the new rule is not far apart from the initial(pre-trained) one. The constraint loss ensues a specific rule to be modified or added.
  13. Also .2|| . ||^2 denotes the L2L2 loss.

2. The Analogy of Associative Memory

  1. Any matrix WW can be used as an associative memory that stores a set of key-value pairs {(ki,vi)(k_i, v_i)}. This stored set of key-value pair can be retrieved by matrix multiplication such that,
  2. viWkiv_i \approx Wk_i
  3. The reason for approximate quality is because practically such a memory bank is not error-free. We can create error-free memory by having keys such that they form a set of mutually orthogonal unit-norm vectors. However, in the above formulation of associative memory, only NN orthogonal keys.
  4. The authors have considered the convolutional weights of the target layer LL as an associative memory. Thus instead of thinking the layer as a collection of convolutional filtering operations, **the layer is considered as a memory that associates memory keys to values.
  5. Each key is a single location feature vector. At the same time, the value is the output arrangement of pixels.
  6. To support more than NN nonorthogonal keys, {kik_i}, instead of requiring exact equality that is vi=Wkiv_i = Wk_i, error be minimized such that,
  7. W0=ΔargminxiviWki2W_0 \overset{\Delta}{=} \underset{x}{\arg\min} \sum_{i} ||v_i - Wk_i||^2
  8. This is a least-squares problem, the minimal solution of which can be found by solving for W0W_0 using the normal equation W0KKT=VKTW_0KK^T = VK^T. Here KK and VV are simplified notations whose i-th column is the i-th key or value. More on least-square approximation here.

3. How to Update WW to Insert a new Value?

To add a new rule or modify an existing rule we will have to modify the pre-trained weights of layer LL which is given by W0W_0. The user will provide a single key to assign a new value such that kvk_* \rightarrow v_*. The modified weights matrix W1W_1 should be such that it satisfies two conditions:
  • It should store a new value.
  • It should continue to minimize errors in all the previously stored values.
This modified W1W_1 is given by,
W1=argminxVWK2W_1 = \underset{x}{\arg\min} ||V - WK||^2 subject to v=W1kv_* = W_1k_*
Just like previous section point 5, this is a least square problem, however this time it's a constrained least-square(CLS) problem which can be solved exactly as:
W1KKT=VKT+ΛkTW_1KK^T = VK^T + \Lambda k_*^T where ΛRn\Lambda \in \R^n
From the previous section point 5, we can replace VKTVK^T at W0KKTW_0KK^T. The modified equation is,
W1KKT=W0KKT+ΛkTW_1KK^T = W_0KK^T + \Lambda k_*^T or,
W1=W0+Λ(C1k)TW_1 = W_0 + \Lambda (C^{-1}k_*)^T where C=ΔKKTC \overset{\Delta}{=} KK^T
There are two interesting points of the last equation derived,
  • For the requested mapping kvk_* \rightarrow v_* the final form of the equation transforms the soft error minimization objective into hard constraint such that the weights be updated in a particularly straight direction C1kC^{-1}k_*.
  • The update direction C1kC^{-1}k_* is determined only by overall key statistics and the specific targeted key kk_*.
Note: CC is a model constant which can be pre-computed and cached. Only Λ\Lambda which specifies the magnitude of change depends on target value vv_*.

4. Generalizing to a Non-Linear layer

So far, the discussion was done, assuming a linear setting. However, the convolutional layer of a generator or, in general, have several non-linear components like biases, activation(ReLU), normalization, style module, etc.
The authors have generalized the idea of associative memory such that:
  • They first define the update direction, d=ΔC1kd \overset{\Delta}{=} C^{-1}k_*. It was derived in the previous section.
  • Then, in order to insert a new key kvk_* \rightarrow v_*, they begin with W0W_0 and perform an optimization over the rank-one subspace defined by the row vector dTd^T. First, this optimization is solved:
  • Λ1=argminΛRMvf(k;W0+ΛdT)\Lambda_1 = \underset{\Lambda\in\R^{M}}{\arg\min} ||v_* - f(k_*; W_0 + \Lambda d^T)||
  • Once Λ1\Lambda_1 is calculated, weight is updated such that,
  • W1=W0+Λ1dTW_1 = W_0 + \Lambda_1 d^T
The authors have expanded this idea to insert the desired change for more than one key at once.

User Interface

If you are frustrated with all this mathematics, the good news is that the authors were kind enough to build us a user interface to edit a GAN easily. The video linked above is an excellent demonstration of this user interface tool by the authors.

Figure 3: The copy-paste-context interface for rewriting a model. (Source)

This interface provides a three-step rewriting process:
  • Copy - the user can use a brush to copy fascinating region in an image to define target value VV_*
  • Paste - the user can paste the copied value into a single target image specifying KVK_* \rightarrow V_* constraint.
  • Context - for generalization, the user can select the same interesting region in multiple images. This establishes the updated direction dd for the associative memory.

Try out the interface in Google Colab \rightarrow


Figure 4: User interface as seen after running the linked Colab notebook.

Putting it all Together and Ending Note

The authors have used the user interface and the idea behind rewriting generative models to showcase multiple use cases. Some of them are:
  • Putting objects/interesting regions into a new context. You can add trees to the top of the building or a dome in place of a pointy tower. You can also put the moustache in place of an eyebrow!
  • Removing undesired features like a watermark in an image or some artifacts in an image. This is a handy outcome of this idea.
In the words of the authors,
"Machine learning requires data, so how can we create effective models for data that do not yet exist? Thanks to the rich internal structure of recent GANs, in this paper, we have found it feasible to create such models by rewriting the rules within existing networks."
This is truly a novel bit.
Thank you for sticking so far with this article. I wrote this article with the intent to provide an easy start to the paper. If something is still not clear or I made some mistakes, please provide your feedback in the comment section.
Finally, this article showcases the rich writing experience one can get while writing such W&B reports. I hope you enjoyed reading it. Thank you.
Kartik Thakral
Kartik Thakral •  
are trained
Reply
Iterate on AI agents and models faster. Try Weights & Biases today.