GANspace: An Overview of Generative Adversarial Networks

This article provides an overview of the unsupervised discovery of interpretable feature directions in pre-trained generative adversarial networks (GANs).
Adrish Dey
Created on December 31|Last edited on December 5
Comment
Since its inception in 2014, Generative Adversarial Networks (aka GANs) has been at the forefront of Generative Modeling research. GANs, with their millions of parameters, have found extensive usage in modeling probability densities of various complex data formats and generating ultra-realistic samples. These include photorealistic images, near-human speech synthesis, music generation, photorealistic video synthesis, etc. 
These, along with its likelihood-free density estimation framework, GANs have found application in various challenging problems. One such area of interest involves modeling densities with interpretable conditionals, to generate samples having certain desired features. For example, controlling the intensity of certain features of images of human faces like — smile, color, orientation, gender, facial structure, etc.
Among the various inherent problems in training GANs, this problem poses an additional challenge, i.e., lack of proper metric for atomically labeling intensities of sample features, which hinders the trivial solution of training the GAN under the supervision of the conditional. In simpler terms, it is virtually impossible to have a dataset with a measure for the number of smiles in a headshot, thus making it impossible to perform the training in a supervised setting.
This work studies a novel approach for discovering GAN controls in the Gaussian prior, in an unsupervised way. More importantly, this method doesn't need any retraining of GANs. This makes it convenient to generate controlled samples from pretrained state-of-the-art GAN architectures, like BigGAN, StyleGAN, StyleGAN2, etc.
﻿Codebase used for producing results→\rightarrow→﻿﻿﻿
﻿Paper →\rightarrow→﻿﻿﻿
﻿Colab Notebook →\rightarrow→﻿﻿﻿
Table of ContentsTable of ContentsBrief Primer on Relevant BackgroundGenerative Adversarial NetworksSpectral DecompositionImportant TakeawayPrior Space Exploration in GANsFinding Principal DirectionsBigGAN Based Architectures (Isotropic Priors)Interpreting Principal Feature Directions
﻿
﻿
Brief Primer on Relevant Background
Generative Adversarial NetworksIt is not a surprise that real-world data arises from a remarkably complex probability distribution. This complicated nature renders any likelihood-based density estimator useless. More formally, the likelihood integral ∫P(X∣z)P(z)dz\int P(X | z) P(z) dz∫P(X∣z)P(z)dz﻿ of the Bayesian framework for posterior estimation becomes intractable to evaluate.
One trivial solution for this problem is to avoid the exact evaluation of the integral and approximate it using a simpler distribution, Q(z)Q(z)Q(z)﻿, called a variational lower bound. For example, VAEs by Kingma et. al. uses N(μ,Σ2)\mathcal{N}(\mu, \Sigma^2)N(μ,Σ2)﻿ as the variational lower bound. However, this approximation doesn't capture the intricacies of the likelihood distribution properly resulting in poor quality samples.
GANs, with its novel game-theoretic framework, solves this problem by avoiding the likelihood evaluation step by introducing a neural network between the prior and the posterior. First proposed by Goodfellow et.al. in his seminal paper "Generative Adversarial Networks", GANs model the density estimation process as a zero-sum game between two competing players. One of the players, the generator G, is modelled as a parametric function GθG_\thetaGθ​﻿ between the prior P(z)P(z)P(z)﻿ (commonly isotropic gaussian N(0,I)\mathcal{N}(0, I)N(0,I)﻿) and posterior space P(X∣z)P(X | z)P(X∣z)﻿. The discriminator D, parameterized by ϕ\phiϕ﻿, tries to classify the samples between real and synthetic. For each optimization step, a player takes turns to improve its result, until a Nash equilibrium is attained. The formal objective is defined as follows.
﻿minG maxD V(G,D;θ,ϕ)\underset{G}{min} \, \underset{D}{max} \, V(G, D; \theta, \phi)Gmin​Dmax​V(G,D;θ,ϕ)﻿﻿
Spectral DecompositionAmong the various canonical forms of matrix representation, the eigenvector-eigenvalue form finds the most usage in machine learning literature. This is evident by its prevalence in recommendation systems literature, especially in dimensionality reduction. Eigenvalues have also found extensive usage in invertible neural networks, where singular values of weight matrices play a crucial role in preserving the isometry of layers.
Formal DefinitionGiven a square matrix M\mathbf{M}M﻿ considered as a linear map from some vector space VVV﻿ onto itself, i.e. M:V→V\mathbf{M}: V \rightarrow VM:V→V﻿, the eigenvector v\mathbb{v}v﻿ of the vector space, is defined as the vector whose linear transformation using M\mathbf{M}M﻿, results only in a scale transformation by some constant λ\lambdaλ﻿. i.e., Mv=λv\mathbf{M}\mathbb{v} = \lambda \mathbb{v}Mv=λv﻿,  where λ\lambdaλ﻿ is the known as the eigenvalue.
For a set of all rrr﻿ eigenvectors and eigenvalues,
MQ=QΛ\mathbf{M}\mathbf{Q} = \mathbf{Q} \LambdaMQ=QΛ﻿, where rrr﻿ is the rank of the matrix MMM﻿, QQQ﻿ is the matrix of eigenvectors whose ithi^{th}ith﻿ column denotes the ithi^{th}ith﻿ eigenvector on VVV﻿, Λ\LambdaΛ﻿ is a diagonal matrix, where each entry λi\lambda_iλi​﻿ is the eigenvalue of the corresponding ithi^{th}ith﻿ eigenvector.
Therefore, M=QΛQ−1\mathbf{M} = \mathbf{Q}\Lambda\mathbf{Q}^{-1}M=QΛQ−1﻿ is defined as the eigen decomposition of matrix M\mathbf{M}M﻿.
Despite the intuitive nature and widespread popularity of eigendecomposition, in practice, a generalized version of eigendecomposition, called singular value decomposition (SVD) is used. SVD removes the square matrix limitation of the eigendecomposition, by considering two vector spaces, row space(RRR﻿) and column space (CCC﻿), instead of one. The final decomposition provides orthonormal basis vectors(analogous to eigenvectors) of the row space and column space, and the singular values(analogous to eigenvalues) accompanying it. This orthonormality allows accurate calculation of matrix inverses, making the decomposition computationally efficient.
Formal DefinitionLet M\mathbf{M}M﻿ be a linear map from a vector space RRR﻿ to a vector space CCC﻿, i.e., M:R→C\mathbf{M}: R \rightarrow CM:R→C﻿. Let U\mathbf{U}U﻿ be the matrix of singular vectors in RRR﻿, and V\mathbf{V}V﻿ be the matrix of singular vectors in CCC﻿. Without loss of generality from the previous defintion, MV=UΣ\mathbf{MV} = \mathbf{U}\SigmaMV=UΣ﻿, where Σ\SigmaΣ﻿ is a diagonal matrix of singular values. Thus the singular value decomposition is defined as M=UΣV−1\mathbf{M} = \mathbf{U}\Sigma\mathbf{V}^{-1}M=UΣV−1﻿ Since V\mathbf{V}V﻿ is an orthogonal matrix, hence, the final decomposition is defined as: M=UΣVT\mathbf{M} = \mathbf{U}\Sigma\mathbf{V}^TM=UΣVT﻿﻿
Important TakeawayThe methods discussed above decomposes a matrix in linearly independent vectors. Hence, for a full-rank matrix decomposition, the eigenvectors/singular vectors form a basis on the vector space(s). One important observation is that, since the vectors are normalized and form a basis, they can be used to represent any vector in the space, by performing an arbitrary linear combination of the vectors.
Note: The top-k column space singular vectors, (i.e., kkk﻿ column space singular vectors associated with the kkk﻿ largest singular values), are called the k-principal components. The process representing a matrix decomposition, by selecting dominant singular vectors is also known as  Principal Component Analysis (PCA).
Prior Space Exploration in GANsInterpretable GAN control discovery, aka, latent space disentanglement in GANs, has been a heavily studied problem. The most notable work by Chen et. al., called InfoGAN, used mutual information between the posterior Y∼P(X∣z,C)Y \sim P(X|z, C)Y∼P(X∣z,C)﻿ and a conditional vector CCC﻿ passed along with prior. 
However, this required retraining the GAN with an additional entropy loss along with the adversarial loss.
In this work, instead of conditional training, the author explores a novel approach for finding "directions" of each principal features in the prior space of the GAN. This means, increasing or decreasing the magnitude of the prior zzz﻿ in that direction, would result in increase or decrease of corresponding feature intensity, in the generated sample. More formally,
Given a directional vector vi\mathbb{v}_ivi​﻿ for the ithi^{th}ith﻿ feature, with intensity xix_ixi​﻿, the new prior zzz﻿ is re-defined as:
﻿z′=z+∑ixi⋅viz^\prime = z + \sum\limits_{i} x_i \cdot \mathbb{v}_iz′=z+i∑​xi​⋅vi​﻿﻿
This can redefined as:
﻿z′=z+Vxz^\prime = z  + \mathbf{V}\mathbf{x}z′=z+Vx﻿﻿
where V\mathbf{V}V﻿ is the matrix of directional vectors viv_ivi​﻿, and x\mathbf{x}x﻿ is a vector of intensities xix_ixi​﻿ corresponding to each basis vector.
Finding Principal DirectionsOne of the primary observations presented in this work, is the correlation of individual feature intensities, with the Principal Components of Feature Tensors in the early layers of GANs. In other words, the authors observed decomposing the feature tensors, disentangles certain features along each column space singular vector, where dominant singular vector encodes dominant features.
This raises an obvious question, "How does principal components in feature space, assist in finding principal directions in the prior space zzz﻿?". For this purpose the author studies two architectures, one having isotropic prior, (ex. BigGAN) and other with learned feature vectors as the prior. (ex. StyleGAN, StyleGAN2).
StyleGAN-based Architectures (Feature-Based Priors)This process is trivial in StyleGAN-based architectures, where encoded feature tensors (w=M(z,c)∣z∼P(z)w = M(z, c) |_{z\sim P(z)}w=M(z,c)∣z∼P(z)​﻿) is used as a prior for the GAN, where MMM﻿ is the feature encoder and ccc﻿ is the class ID of the sample to generate.
Next, the principal components V\mathbf{V}V﻿ obtained from decomposing www﻿, is used for interpolating on feature intensities, x\mathbf{x}x﻿. Formally,
﻿w′=w+Vxw^\prime = w + \mathbf{V}\mathbf{x}w′=w+Vx﻿﻿
﻿y′=G(w′;θ)y^\prime = G(w^\prime; \theta)y′=G(w′;θ)﻿﻿
The following visualization is created by interpolating on the magnitudes of x\mathbf{x}x﻿ for a desired direction vector in V\mathbf{V}V﻿.
﻿
﻿
﻿
Run set1
﻿
﻿
BigGAN Based Architectures (Isotropic Priors)For isotropic priors, the principal directions discovery is a bit complicated than learned feature-based priors. This is obvious since the isotropic prior zzz﻿ is not a learned latent space, and encodes no information about the features of data samples.
To work around this challenge, the authors uses a projection method, to project the principal components at the ithi^{th}ith﻿ layer back to the prior space. More formally, NNN﻿ samples, z1:Nz_{1:N}z1:N​﻿ is sampled from the prior P(z)P(z)P(z)﻿. The calculation of the NNN﻿ feature vectors yi,1:Ny_{i,{1:N}}yi,1:N​﻿ at the ithi^{th}ith﻿ layer is performed as yi,j=G^i(zj),∀j∈[1:N]y_{i, j} = \hat{G}_i(z_j), \forall j \in [1:N]yi,j​=G^i​(zj​),∀j∈[1:N]﻿, where G^i(yi)=Gi(yi−1)\hat{G}_i(y_i) = G_i(y_{i - 1})G^i​(yi​)=Gi​(yi−1​)﻿ and GiG_iGi​﻿ denotes the ithi^{th}ith﻿ layer of GGG﻿. Calculating the principal component of the tensor yi,1:Ny_{i,{1:N}}yi,1:N​﻿ creates a low-rank basis matrix V\mathbf{V}V﻿, which is then used for approximating the basis vectors of yjy_jyj​﻿ as follows: xj=VT(yj−μ)∀j∈[1:N]x_j = \mathbf{V}^T(y_j - \mu) \forall j \in [1:N]xj​=VT(yj​−μ)∀j∈[1:N]﻿  where μ\muμ﻿ is the mean of V\mathbf{V}V﻿.
The basis vectors x1:Nx_{1:N}x1:N​﻿ obtained is then projected to prior space, by use of linear regression. In other words, matrix UUU﻿, where each column, uku_kuk​﻿, denotes the basis vector in zzz﻿, can be found by solving:
﻿U=argminU∑jN∥Uxj−yj∥\mathbf{U} = \underset{\mathbf{U}}{argmin} \sum\limits_j^N \| \mathbf{U}x_j - y_j\|U=Uargmin​j∑N​∥Uxj​−yj​∥﻿﻿
Now the interpolation on zzz﻿ can be defined as:
z′=z+Uxz^\prime = z + \mathbf{Ux}z′=z+Ux﻿ where xkx^kxk﻿ in denotes the intensity of kthk^{th}kth﻿ dominant feature.
﻿
﻿
﻿
Run set1
﻿
﻿
Interpreting Principal Feature DirectionsThe vector representing a feature jjj﻿ (for example rotation, zoom, background, etc.) is chosen by interpolating on the directional magnitude scalar xj\mathbf{x}_jxj​﻿ associated with the directional vector vj\mathbf{v}_jvj​﻿ in V\mathbf{V}V﻿ (for StyleGAN2) and uj\mathbf{u}_juj​﻿ in U\mathbf{U}U﻿ (for BigGAN). The limits for edits are also chosen in the same trial-and-error method. Let E(vj,R)(\mathbf{v}_j, \mathbf{R})(vj​,R)﻿ denote edit operation along direction vj\mathbf{v}_jvj​﻿ or uj\mathbf{u}_juj​﻿ by a factor of xj\mathbf{x}_jxj​﻿, such that xj\mathbf{x}_jxj​﻿ falls in the range R\mathbf{R}R﻿. The edits for each property for BigGAN and StyleGAN, as found by trial-and-error, is shown in the bottom-left of the figure below.
﻿
Irish Setter Interpolation - BigGAN
﻿
Man Face Interpolation - StyleGAN
﻿
﻿
Add a comment
Tags: Intermediate, Computer Vision, GenAI, Research, GAN, Panels
Iterate on AI agents and models faster. Try Weights & Biases today.