The Geometry of Deep Generative Image Models
Interpretable GANs using Hessian EigenDecomposition
Created on April 16|Last edited on October 10
Comment
Contents
- TLDR
- The Metric Tensor
- Properties of the Hessian matrix
- Top eigenvectors capture significant image changes
- GAN Latent Spaces are highly anisotropic
- Conclusion

FigA. How a generator works. Image created by Sayantan Das
1. TLDR
This paper aims to make GAN inversion and interpretability possible. Through the lens of differential geometry, the authors propose that the metric tensor has various interpretability properties -- by realising it as the Hessian matrix of the distance metric (which is LPIPS in this case). Experiments show that through perturbing the latent codes via eigendecomposition of the above Hessian gave controllable generation in the Image Space. Also, the ranking of the eigenvectors suggest that the larger ones lead to significant image changes.
2. The Metric Tensor
2.1 Establishing Riemannian Geometry
Because a generator provides a smooth map onto image space, one relevant conceptual model for GAN latent space is a Riemannian manifold.
To define a Riemannian geometry, we need to have a smooth map and a notion of distance on it, defined by the metric tensor. For image applications, the relevant notion of distance is in image space rather than code space. Thus, we can pull back the distance function on the image space onto the latent space. Differentiating this distance function on latent space, we will get a differential geometric structure (Riemannian metric) on the image manifold.
A GAN Generator parameterizes a submanifold in the image space with . Thus, given a distance function in image space , we can define the distance between the images they generate; i.e. pullback the distance function to latent space through .
,
2.2 Enter Hessian
According to Palais 1957,the Hessian matrix (second order partial derivative) of the squared distance can be seen as the metric tensor of the image manifold.
Consider the to be fixed as a function of Obviously is a local min of thus can be locally approximated by a positive semi-definite quadratic form .
This squared vector norm approximates the squared image distance,
.
In this paper, this Hessian matrix will be representing the Metric Tensor and thus our report will use them interchangeably.
3. Properties of the Hessian matrix
We will call as the speed of image change along as measured by metric .
3.1 Numerical Method
The Learned Perceptual Image Patch Similarity (LPIPS) is used as it is doubly differentiable to compute .
Learned Perceptual Image Patch Similarity-------------------------------------------------------We compute the Hessian by building a computational graph towards the gradient and then computing the gradient towards each element in .This method computes column by column, therefore its time complexity is proportionalto the latent-space dimensionality and the backpropagation time through this graph.
The Hessian is a linear operator, which could be defined as long as one can compute the Hessian Vector Product (HVP).
Since the gradient to commutes with inner product with , HVP can be rewritten as the gradient to or the directional derivative to the gradient .
The first form is easy to compute in reverse-mode autodiff, and the directional derivative is easy to compute forward-mode autodiff or finite difference. Then, Lanczos iterations are applied to the HVP operator defined in these two ways to solve the largest eigen pairs, which can reconstruct an approximate Hessian matrix.
From the appendix,
3.2 Connection to Jacobian
and define a manifold where is the feature map of a middle layer in the generator. The metric tensor of this manifold can be derived as the Hessian of .
Note, there is a sample relationship between the Hessian of , and the Jacobian of , .
Through this we know the eigen-spectrum of the Hessian matrix is the square of the singular value spectrum of the Jacobian , and the eigenvectors of is the same as the right singular vectors of .
.
4. Top eigenvectors capture significant image changes
Steps
- Pick randomly.
- Compute
- EigenDecomposition of
- Explore along
Observation
- When eigenvalues are bigger, image changes are higher -- both through visual inspection and LPIPS.
- Eigenvectors at different ranks encode different types of changes.

Extracted from the paper. Images change at different rates along top vs bottom eigenvectors
5. GAN Latent Spaces are highly anisotropic

FigB. Method inspired from Kornblith(2019 to gauge the global geometry of the latent space.
Steps
- Randomly sample 100-1000 z in the latent space and compute using backprop.
- Perform EigenDecomposition.
Observation
A fraction of dimensions were responsible for large image changes as per the spectra graph's sudden dip. Below are some of these graphs on spectra versus dimensions for layers of a DCGAN generator.
Run set
1
The authors measure speed of image change along vector as which they use to further strengthen their argument about the anisotropic conditions of the latent space, by analytically showing (more in Appendix A.6) that the variance of the is lesser than the variance among eigenvalues.
5.B Comments on the metric tensor's global geometry
or the metric tensor provides information on local geometry. To inspect the global consistency of the metric tensor, the authors employ the Pearson Correlation Coefficient as a workaround in the following way:
- At Position we compute -- on eigendecomposition of which we obtaineigenvectors of the form .- We compute which is the effect of metric tensor of position oneigenvectors such that - Similarly stands for all terms pertaining to position and so on.- Computing the Pearson Correlation Coefficient as which physicallymeans as the measurement of the consistency of the metric tensor.
As the spectrum usually spanned several orders of magnitude, the authors computed the correlation on the log scale.
Run: lilac-firefly-2
1
Takeaway
This shows that the local directions that induce image changes of different orders of magnitude are highly consistent at different points in the latent space. Because of this, the notion of a ”global” Hessian makes sense.

As latent space gets warped and mapped into image space, directions in latent spaces are scaled differently by the Jacobian of the map. Picture by Sayantan Das
Conclusion
In this work, we developed an efficient and architecture-agnostic way to compute the geometry of the manifold learnt by generative networks. This method discovers axes accounting for the largest variation in image transformation, which frequently represent semantically interpretable changes.
Subsequently, this geometric method can facilitate image manipulation, increase explainability, and accelerate optimization on the manifold (with or without gradients).
Note
I suggest readers looking to get a more in-depth of this paper to thoroughly check out Section 5 of the paper and the remainder of the Appendix.
Readers can check out the remainder of the wandb runs that are not included in this report here and feel free to drop in questions in the comment box below.
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.