Skip to main content

The Geometry of Deep Generative Image Models

Interpretable GANs using Hessian EigenDecomposition
Created on April 16|Last edited on October 10

Contents

  1. TLDR
  2. The Metric Tensor
  3. Properties of the Hessian matrix
  4. Top eigenvectors capture significant image changes
  5. GAN Latent Spaces are highly anisotropic
  6. Conclusion
FigA. How a generator works. Image created by Sayantan Das

1. TLDR

This paper aims to make GAN inversion and interpretability possible. Through the lens of differential geometry, the authors propose that the metric tensor has various interpretability properties -- by realising it as the Hessian matrix of the distance metric (which is LPIPS in this case). Experiments show that through perturbing the latent codes via eigendecomposition of the above Hessian gave controllable generation in the Image Space. Also, the ranking of the eigenvectors suggest that the larger ones lead to significant image changes.

2. The Metric Tensor

2.1 Establishing Riemannian Geometry

Because a generator provides a smooth map onto image space, one relevant conceptual model for GAN latent space is a Riemannian manifold.
To define a Riemannian geometry, we need to have a smooth map and a notion of distance on it, defined by the metric tensor. For image applications, the relevant notion of distance is in image space rather than code space. Thus, we can pull back the distance function on the image space onto the latent space. Differentiating this distance function on latent space, we will get a differential geometric structure (Riemannian metric) on the image manifold.
A GAN Generator G(z)G(z) parameterizes a submanifold in the image space with zRnz \in \mathbb{R}^n. Thus, given a distance function in image space D:I×IR+D:\mathbb{I} \times \mathbb{I} \rightarrow \mathbb{R}_+, (I1,I2)L,(\mathbb{I}_1,\mathbb{I}_2) \rightarrow \mathbb{L},we can define the distance between the images they generate; i.e. pullback the distance function to latent space through GG.
d:Rn×RnR+d: \mathbb{R}^n\times \mathbb{R}^n \rightarrow \mathbb{R}_+, d(z1,z2)=D(G(z1),G(z2))d(z_1,z_2) = D(G(z_1),G(z_2))

2.2 Enter Hessian

According to Palais 1957,
the Hessian matrix (second order partial derivative) of the squared distance
d2d^2 can be seen as the metric tensor of the image manifold.
Consider the d2d^2 to be fixed z0z_0 as a function of z,fz0(z)=d2(z0,z).z,f_{z_0}(z) = d^2(z_0,z). Obviously z=z0z = z_0 is a local min of fz0(z),f_{z_0}(z), thus fz0(z)f_{z_0}(z) can be locally approximated by a positive semi-definite quadratic form H(z0)H(z_0).
This squared vector norm approximates the squared image distance,
d2(z0,z0+δz)δzH2d^2(z_0,z_0 + \delta_z) \approx \rVert \delta_z\lVert^2_H
=δzTH(z0)δz= \delta_z^TH(z_0)\delta_z.
In this paper, this Hessian matrix will be representing the Metric Tensor and thus our report will use them interchangeably.

3. Properties of the Hessian matrix

We will call αH(v)=vTHvvTv\alpha_H(v) = \frac{v^THv}{v^Tv} as the speed of image change along vv as measured by metric HH.
d2(z0,z)δzT2d2(z0,z)z2z0δz,H(z0):=2d2(z0,z)z2z0d^2(z_0,z) \approx \delta_z^T \frac{\partial^2d^2(z_0,z)}{\partial z^2} |_{z_0} \delta_z,H(z_0) := \frac{\partial^2 d^2(z_0,z)}{\partial z^2}|_{z_0}


3.1 Numerical Method

The Learned Perceptual Image Patch Similarity (LPIPS) is used as it is doubly differentiable to compute HH.
Learned Perceptual Image Patch Similarity
-------------------------------------------------------
We compute the Hessian by building a computational graph towards the gradient
g(z)=zd2z=z0g(z) = \partial_z d^2|_{z=z_0} and then computing the gradient towards each element in g(z)g(z).
This method computes HH column by column, therefore its time complexity is proportional
to the latent-space dimensionality nn and the backpropagation time through this graph.
The Hessian is a linear operator, which could be defined as long as one can compute the Hessian Vector Product (HVP).
Since the gradient to zz commutes with inner product with vv, HVP can be rewritten as the gradient to vTg,v^Tg, or the directional derivative to the gradient vTzgv^T\partial_zg.
The first form z(vTg)\partial_z(v^Tg) is easy to compute in reverse-mode autodiff, and the directional derivative is easy to compute forward-mode autodiff or finite difference. Then, Lanczos iterations are applied to the HVP operator defined in these two ways to solve the largest eigen pairs, which can reconstruct an approximate Hessian matrix.
HVP:vHvHVP: v \rightarrow Hv

=z(vTg(z))= \partial_z(v^Tg(z))

=vTz(g(z))= v^T\partial_z(g(z))

(g(z+σv)g(zσv))2ϵv\approx \frac{(g(z+\sigma v) - g(z - \sigma v))}{2\rVert\epsilon v\lVert}

From the appendix,
HVPbackward:vz(vTg(z))HVP_{backward}: v \rightarrow \partial_z(v^Tg(z))

HVPforward:v(g(z+σv)g(zσv))2ϵvHVP_{forward} : v \rightarrow \frac{(g(z+\sigma v) - g(z - \sigma v))}{2\rVert\epsilon v\lVert}


3.2 Connection to Jacobian

ϕ(z):RNRM,\phi(z): \mathbb{R}^N \rightarrow \mathbb{R}^M, dϕ2(z1,z2)=12ϕ(z1)ϕ(z2)22,d_\phi^2(z_1,z_2) = \frac{1}{2}\rVert\phi(z_1) - \phi(z_2)\lVert^2_2, and define a manifold where RM\mathbb{R}^M is the feature map of a middle layer in the generator. The metric tensor HϕH_\phi of this manifold can be derived as the Hessian of dϕ2d^2_\phi.
Note, there is a sample relationship between the Hessian of dϕ2d^2_\phi, HϕH_\phi and the Jacobian of ϕ\phi, JϕJ_\phi.
Through this we know the eigen-spectrum of the Hessian matrix HϕH_\phi is the square of the singular value spectrum of the Jacobian JϕJ_\phi, and the eigenvectors of HϕH_\phi is the same as the right singular vectors of JϕJ_\phi.
Hϕ(z0)=2z212ϕ(z0)ϕ(z)22z0=Jϕ(z0)TJϕ(z0)H_\phi(z_0) = \frac{\partial^2}{\partial z^2} \frac{1}{2} \lVert \phi(z_0) - \phi(z)\rVert^2_2|_{z_0} = J_\phi(z_0)^TJ_\phi(z_0)
vTHϕ(z0)v=Jϕ(z0)v2,Jϕ(z0)=zϕ(z)z0v^TH_\phi(z_0)v = \lVert J_\phi(z_0)v\rVert^2, J_\phi(z_0) = \partial_z \phi(z)|_{z_0}.

4. Top eigenvectors capture significant image changes

Steps

  • Pick z0z_0 randomly.
  • Compute H(z0)H(z_0)
  • EigenDecomposition of H(z0)=iλiviviTH(z_0) = \sum_i \lambda_i v_i v_i^T
  • Explore along G(z0+μivi)G(z_0 + \mu_i v_i)

Observation

  1. When eigenvalues are bigger, image changes are higher -- both through visual inspection and LPIPS.
  2. Eigenvectors at different ranks encode different types of changes.
Extracted from the paper. Images change at different rates along top vs bottom eigenvectors

5. GAN Latent Spaces are highly anisotropic

FigB. Method inspired from Kornblith(2019 to gauge the global geometry of the latent space.

Steps

  1. Randomly sample 100-1000 z in the latent space and compute H(z)H(z) using backprop.
  2. Perform EigenDecomposition.

Observation

A fraction of dimensions were responsible for large image changes as per the spectra graph's sudden dip. Below are some of these graphs on spectra versus dimensions for layers of a DCGAN generator.

Run set
1

The authors measure speed of image change along vector vv as αH(v)=vTHvvTv\alpha_H(v) = \frac{v^THv}{v^Tv} which they use to further strengthen their argument about the anisotropic conditions of the latent space, by analytically showing (more in Appendix A.6) that the variance of the α(v)\alpha(v) is lesser than the variance among eigenvalues.

5.B Comments on the metric tensor's global geometry

H(z)H(z) or the metric tensor provides information on local geometry. To inspect the global consistency of the metric tensor, the authors employ the Pearson Correlation Coefficient as a workaround in the following way:
- At Position ziz_i we compute HiH_i -- on eigendecomposition of which we obtain
eigenvectors of the form Ui=[u1,...,un]U_i = [u_1,...,u_n].
- We compute Λij\Lambda_{ij} which is the effect of metric tensor HjH_j of position zjz_j on
eigenvectors UiU_i such that Λij=uiTHjui\Lambda_{ij} = u_i^TH_ju_i
- Similarly Λj\Lambda_j stands for all terms pertaining to position zjz_j and so on.
- Computing the Pearson Correlation Coefficient as corr(Λij,Λj)corr(\Lambda_{ij},\Lambda_j) which physically
means as the measurement of the consistency of the metric tensor.
As the spectrum usually spanned several orders of magnitude, the authors computed the correlation on the log scale.


Run: lilac-firefly-2
1


Takeaway

This shows that the local directions that induce image changes of different orders of magnitude are highly consistent at different points in the latent space. Because of this, the notion of a ”global” Hessian makes sense.
As latent space gets warped and mapped into image space, directions in latent spaces are scaled differently by the Jacobian of the map. Picture by Sayantan Das


Conclusion

In this work, we developed an efficient and architecture-agnostic way to compute the geometry of the manifold learnt by generative networks. This method discovers axes accounting for the largest variation in image transformation, which frequently represent semantically interpretable changes.
Subsequently, this geometric method can facilitate image manipulation, increase explainability, and accelerate optimization on the manifold (with or without gradients).


Note

I suggest readers looking to get a more in-depth of this paper to thoroughly check out Section 5 of the paper and the remainder of the Appendix.
Readers can check out the remainder of the wandb runs that are not included in this report here and feel free to drop in questions in the comment box below.
Iterate on AI agents and models faster. Try Weights & Biases today.