The Geometry of Deep Generative Image Models

Interpretable GANs using Hessian EigenDecomposition. Made by Sayantan Das using Weights & Biases
Sayantan Das
Code \rightarrow
Paper \rightarrow

Contents

  1. TLDR
  2. The Metric Tensor
  3. Properties of the Hessian matrix
  4. Top eigenvectors capture significant image changes
  5. GAN Latent Spaces are highly anisotropic
  6. Conclusion
FigA. How a generator works. Image created by Sayantan Das

1. TLDR

This paper aims to make GAN inversion and interpretability possible. Through the lens of differential geometry, the authors propose that the metric tensor has various interpretability properties -- by realising it as the Hessian matrix of the distance metric (which is LPIPS in this case). Experiments show that through perturbing the latent codes via eigendecomposition of the above Hessian gave controllable generation in the Image Space. Also, the ranking of the eigenvectors suggest that the larger ones lead to significant image changes.

2. The Metric Tensor

2.1 Establishing Riemannian Geometry

Because a generator provides a smooth map onto image space, one relevant conceptual model for GAN latent space is a Riemannian manifold.
To define a Riemannian geometry, we need to have a smooth map and a notion of distance on it, defined by the metric tensor. For image applications, the relevant notion of distance is in image space rather than code space. Thus, we can pull back the distance function on the image space onto the latent space. Differentiating this distance function on latent space, we will get a differential geometric structure (Riemannian metric) on the image manifold.
A GAN Generator G(z) parameterizes a submanifold in the image space with z \in \mathbb{R}^n. Thus, given a distance function in image space D:\mathbb{I} \times \mathbb{I} \rightarrow \mathbb{R}_+, (\mathbb{I}_1,\mathbb{I}_2) \rightarrow \mathbb{L},we can define the distance between the images they generate; i.e. pullback the distance function to latent space through G.
d: \mathbb{R}^n\times \mathbb{R}^n \rightarrow \mathbb{R}_+, d(z_1,z_2) = D(G(z_1),G(z_2))

2.2 Enter Hessian

According to Palais 1957,the Hessian matrix (second order partial derivative) of the squared distanced^2 can be seen as the metric tensor of the image manifold.
Consider the d^2 to be fixed z_0 as a function of z,f_{z_0}(z) = d^2(z_0,z). Obviously z = z_0 is a local min of f_{z_0}(z), thus f_{z_0}(z) can be locally approximated by a positive semi-definite quadratic form H(z_0).
This squared vector norm approximates the squared image distance,
d^2(z_0,z_0 + \delta_z) \approx \rVert \delta_z\lVert^2_H
= \delta_z^TH(z_0)\delta_z.
In this paper, this Hessian matrix will be representing the Metric Tensor and thus our report will use them interchangeably.

3. Properties of the Hessian matrix

We will call \alpha_H(v) = \frac{v^THv}{v^Tv} as the speed of image change along v as measured by metric H.
d^2(z_0,z) \approx \delta_z^T \frac{\partial^2d^2(z_0,z)}{\partial z^2} |_{z_0} \delta_z,H(z_0) := \frac{\partial^2 d^2(z_0,z)}{\partial z^2}|_{z_0}

3.1 Numerical Method

The Learned Perceptual Image Patch Similarity (LPIPS) is used as it is doubly differentiable to compute H.
Learned Perceptual Image Patch Similarity-------------------------------------------------------We compute the Hessian by building a computational graph towards the gradientg(z) = \partial_z d^2|_{z=z_0} and then computing the gradient towards each element in g(z). This method computes H column by column, therefore its time complexity is proportional to the latent-space dimensionality n and the backpropagation time through this graph.
The Hessian is a linear operator, which could be defined as long as one can compute the Hessian Vector Product (HVP).
Since the gradient to z commutes with inner product with v, HVP can be rewritten as the gradient to v^Tg, or the directional derivative to the gradient v^T\partial_zg.
The first form \partial_z(v^Tg) is easy to compute in reverse-mode autodiff, and the directional derivative is easy to compute forward-mode autodiff or finite difference. Then, Lanczos iterations are applied to the HVP operator defined in these two ways to solve the largest eigen pairs, which can reconstruct an approximate Hessian matrix.
HVP: v \rightarrow Hv
= \partial_z(v^Tg(z))
= v^T\partial_z(g(z))
\approx \frac{(g(z+\sigma v) - g(z - \sigma v))}{2\rVert\epsilon v\lVert}
From the appendix,
HVP_{backward}: v \rightarrow \partial_z(v^Tg(z))
HVP_{forward} : v \rightarrow \frac{(g(z+\sigma v) - g(z - \sigma v))}{2\rVert\epsilon v\lVert}

3.2 Connection to Jacobian

\phi(z): \mathbb{R}^N \rightarrow \mathbb{R}^M, d_\phi^2(z_1,z_2) = \frac{1}{2}\rVert\phi(z_1) - \phi(z_2)\lVert^2_2, and define a manifold where \mathbb{R}^M is the feature map of a middle layer in the generator. The metric tensor H_\phi of this manifold can be derived as the Hessian of d^2_\phi.
Note, there is a sample relationship between the Hessian of d^2_\phi, H_\phi and the Jacobian of \phi, J_\phi.
Through this we know the eigen-spectrum of the Hessian matrix H_\phi is the square of the singular value spectrum of the Jacobian J_\phi, and the eigenvectors of H_\phi is the same as the right singular vectors of J_\phi.
H_\phi(z_0) = \frac{\partial^2}{\partial z^2} \frac{1}{2} \lVert \phi(z_0) - \phi(z)\rVert^2_2|_{z_0} = J_\phi(z_0)^TJ_\phi(z_0)
v^TH_\phi(z_0)v = \lVert J_\phi(z_0)v\rVert^2, J_\phi(z_0) = \partial_z \phi(z)|_{z_0}.

4. Top eigenvectors capture significant image changes

Steps

Observation

  1. When eigenvalues are bigger, image changes are higher -- both through visual inspection and LPIPS.
  2. Eigenvectors at different ranks encode different types of changes.
Extracted from the paper. Images change at different rates along top vs bottom eigenvectors

5. GAN Latent Spaces are highly anisotropic

FigB. Method inspired from Kornblith(2019 to gauge the global geometry of the latent space.

Steps

  1. Randomly sample 100-1000 z in the latent space and compute H(z) using backprop.
  2. Perform EigenDecomposition.

Observation

A fraction of dimensions were responsible for large image changes as per the spectra graph's sudden dip. Below are some of these graphs on spectra versus dimensions for layers of a DCGAN generator.
The authors measure speed of image change along vector v as \alpha_H(v) = \frac{v^THv}{v^Tv} which they use to further strengthen their argument about the anisotropic conditions of the latent space, by analytically showing (more in Appendix A.6) that the variance of the \alpha(v) is lesser than the variance among eigenvalues.

5.B Comments on the metric tensor's global geometry

H(z) or the metric tensor provides information on local geometry. To inspect the global consistency of the metric tensor, the authors employ the Pearson Correlation Coefficient as a workaround in the following way:
- At Position z_i we compute H_i -- on eigendecomposition of which we obtain eigenvectors of the form U_i = [u_1,...,u_n].- We compute \Lambda_{ij} which is the effect of metric tensor H_j of position z_j on eigenvectors U_i such that \Lambda_{ij} = u_i^TH_ju_i- Similarly \Lambda_j stands for all terms pertaining to position z_j and so on.- Computing the Pearson Correlation Coefficient as corr(\Lambda_{ij},\Lambda_j) which physically means as the measurement of the consistency of the metric tensor.
As the spectrum usually spanned several orders of magnitude, the authors computed the correlation on the log scale.

Takeaway

This shows that the local directions that induce image changes of different orders of magnitude are highly consistent at different points in the latent space. Because of this, the notion of a ”global” Hessian makes sense.
As latent space gets warped and mapped into image space, directions in latent spaces are scaled differently by the Jacobian of the map. Picture by Sayantan Das

Conclusion

In this work, we developed an efficient and architecture-agnostic way to compute the geometry of the manifold learnt by generative networks. This method discovers axes accounting for the largest variation in image transformation, which frequently represent semantically interpretable changes.
Subsequently, this geometric method can facilitate image manipulation, increase explainability, and accelerate optimization on the manifold (with or without gradients).

Note

I suggest readers looking to get a more in-depth of this paper to thoroughly check out Section 5 of the paper and the remainder of the Appendix.
Readers can check out the remainder of the wandb runs that are not included in this report here and feel free to drop in questions in the comment box below.