Rewriting a Deep Generative Model: An Overview

In this article, we will explore the work presented in the paper "Rewriting a Deep Generative Model" by Bau et al. It shows a new way of looking at deep neural networks. This is a translated version of the article. Feel free to report any possible mis-translations in the comments section

Ayush Thakur

Created on August 26|Last edited on August 26

Comment

In the words of the authors:
"Deep network training is a blind optimization procedure where programmers define objectives but not the solutions that emerge. In this paper, we ask if deep networks can be created in a different way.Can the rules in a network be directly rewritten?"
﻿The Paper |  The Code |  Interactive Google Colab →﻿The blind optimization procedure is investigated quantitatively in the paper "Deep Ensembles: A Loss Landscape Perspective" by Fort et al. Sayak Paul, and I explored this paper in this  article.
The usual recipe for creating a deep neural network is to train such a model on a massive dataset with a defined objective function. This takes a considerable amount of time and is expensive in most cases. The authors of Rewriting a Deep Generative Model propose a method tocreate new deep networks by rewriting the rule of an existing pre-trained networkas shown in figure 1. By doing so, they wish to enable novice users to easily modify and customize a model without the training time and computational cost of large-scale machine learning.
﻿
﻿
->Figure 1: Rewriting GAN without training to remove the watermark, to add people, and to replace the tower with the tree. (Source) <-  
They do so by setting up a new problem statement:manipulation of specific rules encoded by a deep generative model.
If you are unfamiliar with deep generative models, here is my  take on the same.
Table of ContentsWhy Is Rewriting Deep Generative Model Useful?An Overview of the PaperUser InterfacePutting it all Together and Ending Note
﻿
Why Is Rewriting Deep Generative Model Useful?Deep  generative models such as a GAN can learn rich semantic and physical rules about a target distribution (faces, etc.). However, it usually takes weeks to achieve state-of-the-art GAN on any dataset.  
If the target distribution changes by some amount, retraining a GAN would be a waste of resources.  However, what if we directly change some trained GAN rules to reflect the target distribution change? Thus by rewriting a GAN:
We can build a new model without retraining, which is a more involved task.
From the perspective of demystifying deep neural nets, the approach to edit a model gives new insight about the model and how semantic features are captured.
It can also provide some insight into the generalization of deep models to unseen scenarios.
Unlike conventional image editing tools where the desired change is applied on a single image, by editing a GAN, one can apply the edit on every image generated.
Using this tool, one can build new generative models without domain expertise, training time, and computational expense.
﻿
 	
An Overview of the PaperThe usual practice is to train a new model for every new version(slightly different target distribution) of the same dataset. By rewriting a specific rule without effecting other rules captured by the model, a lot of training and computing time is reduced.
How can we edit generative models? In the words of the authors,
"..we show how togeneralize the idea of a linear associative memory to a nonlinear convolutional layerof a deep generator. Each layer stores latent rules as a set of key-value relationships over hidden features. Ourconstrained optimization aims to add or edit one specific rulewithin the associative memory while preserving the existing semantic relationships in the model as much as possible. We achieve it by directly measuring and manipulating the model's internal structure,without requiring any new training data."
To specify the rules, the authors have provided an easy to use interface. The video linked above has a clear explanation of using this interface.
﻿Try out the interface in Google Colab→\rightarrow→﻿﻿﻿
1. Change Rule With Minimal Collateral DamageWe start with a pre-trained GAN. The authors have used  StyleGAN and  Progressive GAN pre-trained models trained on multiple datasets.  
With a given pre-trained generator(yes we are discarding discriminator),G(z;θ0)G(z; θ_0)G(z;θ0​)﻿, we can generate many(infinite) synthetic images. To generate an imagexix_ixi​﻿a latent code(just a random vector sampled from a multivariate normal distribution)ziz_izi​﻿is required. Thus for a givenziz_izi​﻿;xi=G(zi;θ0)x_i = G(z_i; θ_0)xi​=G(zi​;θ0​)﻿.
The user wants to apply a change such that the new image isx∗ix_{*i}x∗i​﻿. For the generator,G(zi;θ0)G(z_i; θ_0)G(zi​;θ0​)﻿to producex∗ix_{*i}x∗i​﻿would not be possible since the changed image represents a target distribution that the GAN was not trained on. We thus need to find updated weightsθ1θ_1θ1​﻿such thatx∗i≈G(zi;θ1)x_{*i} \approx G(z_i; θ_1)x∗i​≈G(zi​;θ1​)﻿. Interesting!
Note thatθθθ﻿represents the trainable weights or parameters in our GAN generator. The number of parameters is huge in standard SOTA GANs leading to easy overfitting. The aim here is to have a generalized rewritten GAN.
The authors provided two modifications to the standard approach to tackle this manipulation of hidden features in the generator.
Instead of updating all ofθθθ﻿, the authors proposed to modify the weights of one layer. Thus all other layers are frozen/made non-trainable.
The objective function(sayL1L_1L1​﻿loss) is applied in the output feature space of that layer instead of the generator's output.
So now, given a layerLLL﻿, letkkk﻿be the feature output from theL−1thL-1^{th}L−1th﻿layer(frozen). We can write the feature output from theLthL^{th}Lth﻿layer asv=f(k;W0)v = f(k; W_0)v=f(k;W0​)﻿. Thus the output from the target layer,vvv﻿,  is defined as a functionfff﻿with input askkk﻿and pre-trained weightsW0W_0W0​﻿.
For a latent codeziz_izi​﻿, the output from the firstL−1L-1L−1﻿layer isk∗ik_{*i}k∗i​﻿. (Note here the use of∗i*i∗i﻿withkkk﻿does not mean a change in the rule specified by the user). Thus the output of the target layerLLL﻿would bevi=f(k∗i,W0)v_i = f(k_{*i}, W_0)vi​=f(k∗i​,W0​)﻿.
For each target examplex∗ix_{*i}x∗i​﻿, this is specified by the user manually, there is a feature changev∗iv_{*i}v∗i​﻿. The aim is to have a generatorGGG﻿such that it can produce target examplex∗ix_{*i}x∗i​﻿with minimum interference with other behavior of the generator.
This can be solved by minimizing a simple joint objective function.
﻿
﻿
﻿
Thus, we are updating the weightsW0W_0W0​﻿, of target layerLLL﻿by minimizingsmooth[[TRANSLATION_FAILED]]constraintloss. Thesmoothloss function ensures that the output generated by the new rule is not far apart from the initial(pre-trained) one. Theconstraintloss ensues a specific rule to be modified or added.
Also∣∣.∣∣2|| . ||^2∣∣.∣∣2﻿denotes theL2L2L2﻿loss.
2. The Analogy of Associative MemoryAny matrixWWW﻿can be used as an associative memory that stores a set of key-value pairs {(ki,vi)(k_i, v_i)(ki​,vi​)﻿[[TRANSLATION_FAILED]]
 vi≈Wkiv_i \approx Wk_ivi​≈Wki​﻿﻿
[[TRANSLATION_FAILED]]NNN﻿[[TRANSLATION_FAILED]]
[[TRANSLATION_FAILED]]LLL﻿[[TRANSLATION_FAILED]]
[[TRANSLATION_FAILED]]
[[TRANSLATION_FAILED]]NNN﻿[[TRANSLATION_FAILED]]kik_iki​﻿[[TRANSLATION_FAILED]]vi=Wkiv_i = Wk_ivi​=Wki​﻿[[TRANSLATION_FAILED]]
﻿W0=Δarg⁡min⁡x∑i∣∣vi−Wki∣∣2W_0 \overset{\Delta}{=} \underset{x}{\arg\min} \sum_{i} ||v_i - Wk_i||^2W0​=Δxargmin​∑i​∣∣vi​−Wki​∣∣2﻿﻿
[[TRANSLATION_FAILED]]W0W_0W0​﻿[[TRANSLATION_FAILED]]W0KKT=VKTW_0KK^T = VK^TW0​KKT=VKT﻿[[TRANSLATION_FAILED]]KKK﻿[[TRANSLATION_FAILED]]VVV﻿ [[TRANSLATION_FAILED]][[TRANSLATION_FAILED]][[TRANSLATION_FAILED]].
3. How to UpdateWWW﻿to Insert a new Value?To add a new rule or modify an existing rule we will have to modify the pre-trainedweights of layerLLL﻿which is given byW0W_0W0​﻿. The user will provide a single key to assign a new value such thatk∗→v∗k_* \rightarrow v_*k∗​→v∗​﻿. The modified weights matrixW1W_1W1​﻿should be such that it satisfies two conditions:
It should store a new value.
It should continue to minimize errors in all the previously stored values.
This modifiedW1W_1W1​﻿is given by,
﻿W1=arg⁡min⁡x∣∣V−WK∣∣2W_1 = \underset{x}{\arg\min} ||V - WK||^2W1​=xargmin​∣∣V−WK∣∣2﻿subject tov∗=W1k∗v_* = W_1k_*v∗​=W1​k∗​﻿﻿
Just like previous section point 5, this is a least square problem, however this time it's a constrained least-square(CLS) problem which can be solved exactly as:
﻿W1KKT=VKT+Λk∗TW_1KK^T = VK^T + \Lambda k_*^TW1​KKT=VKT+Λk∗T​﻿whereΛ∈Rn\Lambda \in \R^nΛ∈Rn﻿﻿
From the previous section point 5, we can replaceVKTVK^TVKT﻿atW0KKTW_0KK^TW0​KKT﻿. The modified equation is,
﻿W1KKT=W0KKT+Λk∗TW_1KK^T = W_0KK^T + \Lambda k_*^TW1​KKT=W0​KKT+Λk∗T​﻿or,
﻿W1=W0+Λ(C−1k∗)TW_1 = W_0 + \Lambda (C^{-1}k_*)^TW1​=W0​+Λ(C−1k∗​)T﻿whereC=ΔKKTC \overset{\Delta}{=} KK^TC=ΔKKT﻿﻿
There are two interesting points of the last equation derived,
For the requested mappingk∗→v∗k_* \rightarrow v_*k∗​→v∗​﻿the final form of the equation transforms the soft error minimization objective into hard constraint such that the weights be updated in a particularly straight directionC−1k∗C^{-1}k_*C−1k∗​﻿.
The update directionC−1k∗C^{-1}k_*C−1k∗​﻿is determined only by overall key statistics and the specific targeted keyk∗k_*k∗​﻿.
Note:CCC﻿is a model constant which can be pre-computed and cached. OnlyΛ\LambdaΛ﻿which specifies the magnitude of change depends on target valuev∗v_*v∗​﻿.
4. Generalizing to a Non-Linear layerSo far, the discussion was done, assuming a linear setting. However, the convolutional layer of a generator or, in general, have several non-linear components like biases, activation(ReLU), normalization, style module, etc.
The authors have generalized the idea of associative memory such that:
They first define the update direction,d=ΔC−1k∗d \overset{\Delta}{=} C^{-1}k_*d=ΔC−1k∗​﻿. It was derived in the previous section.
Then, in order to insert a new keyk∗→v∗k_* \rightarrow v_*k∗​→v∗​﻿, they begin withW0W_0W0​﻿and perform an optimization over the rank-one subspace defined by the row vectordTd^TdT﻿. First, this optimization is solved:
  Λ1=arg⁡min⁡Λ∈RM∣∣v∗−f(k∗;W0+ΛdT)∣∣\Lambda_1 = \underset{\Lambda\in\R^{M}}{\arg\min} ||v_* - f(k_*; W_0 + \Lambda d^T)||Λ1​=Λ∈RMargmin​∣∣v∗​−f(k∗​;W0​+ΛdT)∣∣﻿﻿
OnceΛ1\Lambda_1Λ1​﻿is calculated, weight is updated such that,
  W1=W0+Λ1dTW_1 = W_0 + \Lambda_1 d^TW1​=W0​+Λ1​dT﻿﻿
The authors have expanded this idea to insert the desired change for more than one key at once.
User InterfaceIf you are frustrated with all this mathematics, the good news is that the authors were kind enough to build us a user interface to edit a GAN easily. The video linked above is an excellent demonstration of this user interface tool by the authors.
﻿
Figure 3: The copy-paste-context interface for rewriting a model. (Source)
﻿
This interface provides a three-step rewriting process:
Copy - the user can use a brush to copy fascinating region in an image to define target valueV∗V_*V∗​﻿﻿
Paste - the user can paste the copied value into a single target image specifyingK∗→V∗K_* \rightarrow V_*K∗​→V∗​﻿constraint.
Context - for generalization, the user can select the same interesting region in multiple images. This establishes the updated directionddd﻿for the associative memory.
﻿Try out the interface in Google Colab→\rightarrow→﻿﻿﻿﻿
Figure 4: User interface as seen after running the linked Colab notebook.
Putting it all Together and Ending NoteThe authors have used the user interface and the idea behind rewriting generative models to showcase multiple use cases. Some of them are:
Putting objects/interesting regions into a new context. You can add trees to the top of the building or a dome in place of a pointy tower. You can also put the moustache in place of an eyebrow!
Removing undesired features like a watermark in an image or some artifacts in an image. This is a handy outcome of this idea.
In the words of the authors,
"Machine learning requires data, so how can we create effective models for data that do not yet exist?Thanks to the rich internal structure of recent GANs, in this paper, we have found it feasible to create such models by rewriting the rules within existing networks."
This is truly a novel bit.
Thank you for sticking so far with this article. I wrote this article with the intent to provide an easy start to the paper. If something is still not clear or I made some mistakes, please provide your feedback in the comment section.
Finally, this article showcases the rich writing experience one can get while writing such W&B reports. I hope you enjoyed reading it. Thank you.
﻿

Add a comment

Rewriting a Deep Generative Model: An Overview

The Paper | The Code | Interactive Google Colab →

Table of Contents

Why Is Rewriting Deep Generative Model Useful?

An Overview of the Paper

Try out the interface in Google Colab $\rightarrow$

1. Change Rule With Minimal Collateral Damage

2. The Analogy of Associative Memory

3. How to Update $W$ to Insert a new Value?

4. Generalizing to a Non-Linear layer

User Interface

Try out the interface in Google Colab $\rightarrow$

Putting it all Together and Ending Note

Rewriting a Deep Generative Model: An Overview

﻿The Paper | The Code | Interactive Google Colab →﻿

Table of Contents

Why Is Rewriting Deep Generative Model Useful?

An Overview of the Paper

﻿Try out the interface in Google Colab→\rightarrow→﻿﻿﻿

1. Change Rule With Minimal Collateral Damage

2. The Analogy of Associative Memory

3. How to UpdateWWW﻿to Insert a new Value?

4. Generalizing to a Non-Linear layer

User Interface

﻿Try out the interface in Google Colab→\rightarrow→﻿﻿﻿

Putting it all Together and Ending Note

The Paper | The Code | Interactive Google Colab →

Try out the interface in Google Colab $\rightarrow$

3. How to Update $W$ to Insert a new Value?

Try out the interface in Google Colab $\rightarrow$